I have some data read in from an excel spreadsheet where the curators are not aware of relational databases and handling 1-many relationships, so have put multiple variables in one column:
>df <- data.frame(id=c("X1", "X23", "X5"), vars=c("foo, bar, hello", "world", NA), var2=c(1,2,3))
>df
id vars var2
1 X1 foo, bar, hello 1
2 X23 world 2
3 X5 <NA> 3
I want to transform the vars
column to a new data frame so I can have a 1-many relation:
>df
id var2
X1 X1 1
X23 X23 2
X5 X5 3
>df2
id var
1 X1 foo
2 X1 bar
3 X1 hello
4 X23 world
I am able to parse the vars
column into a list where each entry is a vector of variables:
>library(stringr)
>halfway <- str_split(df$vars, pattern=", ")
>halfway
[[1]]
[1] "foo" "bar" "hello"
[[2]]
[1] "world"
[[3]]
[1] NA
but I'm unsure how to take this list and transform it to a long data.frame
.
I've had a play around, but I can't get it into the long format without losing information about the IDs each of the variables belongs to (using unlist
).
I've also looked at reshape
but it doesn't seem to do what I want.
I could use a for loop to iteratively build up the new table, but that's horribly inefficient. Is there an elegant solution for this?