0

The format I receive a certain data is a kind of dataframe which consists of two columns: IDset and elems

First one is one integer and second one is an string containing IDs separated by comma, as in the following example:

idset <- c(1111,2222,3333)
elems <- c('1,2,3', '1,3,5,7,9', '4,6')
df <- data.frame(idset, elems, stringsAsFactors = F)

So df is:

  idset     elems
1  1111     1,2,3
2  2222 1,3,5,7,9
3  3333       4,6

I would like to have a dataframe (or matrix, or named list) where there is a single element per column (like a "long" table):

   idset elems
1   1111     1
2   1111     2
3   1111     3
4   2222     1
5   2222     2
6   2222     3
7   2222     7
8   2222     9
9   3333     4
10  3333     6

I know I can do it with some nested loops but I was wondering if there is some convenient funcion providing a better solution for this.

Thank you all!

Vecino
  • 41
  • 5

1 Answers1

0

You could use strsplit() and lapply().

l <- lapply(1:nrow(df), function(x) strsplit(df$elems, ",")[[x]])
df1 <- data.frame(do.call(
  rbind, lapply(1:length(l), function(x) cbind(df$idset[x], l[[x]]))))
names(df1) <- names(df)

Yielding

> df1
   idset elems
1   1111     1
2   1111     2
3   1111     3
4   2222     1
5   2222     3
6   2222     5
7   2222     7
8   2222     9
9   3333     4
10  3333     6

Data

> dput(df)
structure(list(idset = c(1111, 2222, 3333), elems = c("1,2,3", 
"1,3,5,7,9", "4,6")), class = "data.frame", row.names = c(NA, 
-3L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110