My input data have a character column 'FP' containing a comma-separated list of features (in this case it's numbers, but it could be anything).
t0 <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), DO = c(0.733,
0.02, 0.086, 0.84, 0.25, 0.537, 0.07, 0.135, 0.201, 0.611), Z = c(2.12,
-0.966, -0.028, 1.239, 1.831, -2.138, -2.233, 0.791, -2.111,
-2.212), SC = c("D", "A", "A", "E", "B", "C", "A", "A", "B",
"D"), FP = c("10,49,28", "41,7,30,2,34,43", "22,35,31,10,3",
"29,6,15", "1,34,43,37", "11,18,10,12,32,17", "15,49", "40,41,37,8,11",
"27,12,20,35,45", "49,28,16,41")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_double",
"collector")), DO = structure(list(), class = c("collector_double",
"collector")), Z = structure(list(), class = c("collector_double",
"collector")), SC = structure(list(), class = c("collector_character",
"collector")), FP = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
I want to create a new data.frame where each FP string is 'expanded', i.e. turned into a vector basically reversing the implicit paste operation, and then 'unmerged', i.e. creating, from each initial row, as many new rows as there are elements of FP.
I can do the first part:
t1 <- t0
t1$FP <- sapply(t1$FP,FUN=function(x) unlist(strsplit(x,",")))
#showing only the first 2 rows
as.data.frame(t1[1:2,])
# ID DO Z SC FP
#1 1 0.733 2.120 D 10, 49, 28
#2 2 0.020 -0.966 A 41, 7, 30, 2, 34, 43
And then I am stuck. I need to obtain something like this:
#showing the result only for the first 2 rows
rbind(merge(t1[1,1:4],data.frame(FP=unlist(t1[1,5])),all=T),merge(t1[2,1:4],data.frame(FP=unlist(t1[2,5])),all=T))
# ID DO Z SC FP
#1 1 0.733 2.120 D 10
#2 1 0.733 2.120 D 49
#3 1 0.733 2.120 D 28
#4 2 0.020 -0.966 A 41
#5 2 0.020 -0.966 A 7
#6 2 0.020 -0.966 A 30
#7 2 0.020 -0.966 A 2
#8 2 0.020 -0.966 A 34
#9 2 0.020 -0.966 A 43
Any idea how to do this better?
Thanks!