Unblundle (weird reshape long) id-"dump-var" data into a long data.frame/table in R

Question

In this toy reproducible example below, I have a data.frame with two columns.

id
dump: a string variable that acts as a "dump" for a bunch of characteristics of id and can have an arbitrary number of elements in each row. In dump, characteristics are separated by ";" (obs: dump is Postgresql speak, don't know of a native R term for this).

library(data.table)
original_data <- data.table(id=1:3,dump=c('a;b;c','x','t,y'))

I want to reshape this into a tidy format, with one row per id-characteristic pair. Preferably a data.table solution for performance.

This is the desired output

desired_data <- data.table(id=c(1,1,1,2,3,3),unduped=c('a','b','c','x','t','y'))

Looks like https://stackoverflow.com/questions/30525811/how-to-separate-comma-separated-values-in-r-in-a-new-row or https://stackoverflow.com/questions/12481418/separate-comma-delimited-cells-to-new-rows or https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows or https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows as possible duplicates — thelatemail, Jun 03 '19 at 22:10

score 0 · Accepted Answer · answered Jun 03 '19 at 21:50

0

We can use separate_rows

library(tidyr)
library(dplyr)
library(data.table)
original_data %>%
      separate_rows(dump)

Or using strsplit

original_data[, {tmp <- strsplit(dump, "[;,]")
     .(id = rep(id, lengths(tmp)), unduped = unlist(tmp))}]

answered Jun 03 '19 at 21:50

akrun

1 Answers1