1

In this toy reproducible example below, I have a data.frame with two columns.

  • id
  • dump: a string variable that acts as a "dump" for a bunch of characteristics of id and can have an arbitrary number of elements in each row. In dump, characteristics are separated by ";" (obs: dump is Postgresql speak, don't know of a native R term for this).

    library(data.table)
    original_data <- data.table(id=1:3,dump=c('a;b;c','x','t,y'))

I want to reshape this into a tidy format, with one row per id-characteristic pair. Preferably a data.table solution for performance.

This is the desired output

desired_data <- data.table(id=c(1,1,1,2,3,3),unduped=c('a','b','c','x','t','y'))
LucasMation
  • 2,408
  • 2
  • 22
  • 45
  • Looks like https://stackoverflow.com/questions/30525811/how-to-separate-comma-separated-values-in-r-in-a-new-row or https://stackoverflow.com/questions/12481418/separate-comma-delimited-cells-to-new-rows or https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows or https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows as possible duplicates – thelatemail Jun 03 '19 at 22:10

1 Answers1

0

We can use separate_rows

library(tidyr)
library(dplyr)
library(data.table)
original_data %>%
      separate_rows(dump)

Or using strsplit

original_data[, {tmp <- strsplit(dump, "[;,]")
     .(id = rep(id, lengths(tmp)), unduped = unlist(tmp))}]
akrun
  • 874,273
  • 37
  • 540
  • 662