1

I am dealing with a large existing data frame that has a column that I need to "transpose" into multiple rows, while retaining the original user Id in each row.

Note the contents of the favs column is literally a string that includes "c( ... )". A simplified version is shown here:

 **uId**   **favs**
 1000     c('pizza')
 1001     c('seafood','steaks')
 1002     NA
 1003     c('sushi','strawberries')

The output I want:

 **uId** **favs**
 1000   pizza
 1001   seafood
 1001   steaks
 1002   NA
 1003   sushi
 1003   strawberries

What is the most efficient way to this? I was consider melt/dcast but not sure how to apply it here since the FAVS column needs to be unlisted and will then contain a varying number of elements.

ScottP
  • 157
  • 1
  • 1
  • 8

1 Answers1

0

We can use unnest

library(tidyr)
unnest(df1)
#    uID         favs
#1 1000        pizza
#2 1001      seafood
#3 1001       steaks
#4 1002         <NA>
#5 1003        sushi
#6 1003 strawberries

Or with base R, replicate the other column rows with the lengths of 'favs' column and unlist the 'favs'

data.frame(uID = rep(df1$uID, lengths(df1$favs)), favs = unlist(df1$favs))

this could be fast as both rep and unlist are fast

data

df1 <- data.frame(uID = 1000:1003,
     favs = I(list('pizza', c('seafood', 'steaks'), NA,
     c('sushi', 'strawberries'))))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662