0

My input data have a character column 'FP' containing a comma-separated list of features (in this case it's numbers, but it could be anything).

t0 <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), DO = c(0.733, 
0.02, 0.086, 0.84, 0.25, 0.537, 0.07, 0.135, 0.201, 0.611), Z = c(2.12, 
-0.966, -0.028, 1.239, 1.831, -2.138, -2.233, 0.791, -2.111, 
-2.212), SC = c("D", "A", "A", "E", "B", "C", "A", "A", "B", 
"D"), FP = c("10,49,28", "41,7,30,2,34,43", "22,35,31,10,3", 
"29,6,15", "1,34,43,37", "11,18,10,12,32,17", "15,49", "40,41,37,8,11", 
"27,12,20,35,45", "49,28,16,41")), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_double", 
    "collector")), DO = structure(list(), class = c("collector_double", 
    "collector")), Z = structure(list(), class = c("collector_double", 
    "collector")), SC = structure(list(), class = c("collector_character", 
    "collector")), FP = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

I want to create a new data.frame where each FP string is 'expanded', i.e. turned into a vector basically reversing the implicit paste operation, and then 'unmerged', i.e. creating, from each initial row, as many new rows as there are elements of FP.

I can do the first part:

t1 <- t0
t1$FP <- sapply(t1$FP,FUN=function(x) unlist(strsplit(x,",")))
#showing only the first 2 rows
as.data.frame(t1[1:2,])
#  ID    DO      Z SC                   FP
#1  1 0.733  2.120  D           10, 49, 28
#2  2 0.020 -0.966  A 41, 7, 30, 2, 34, 43

And then I am stuck. I need to obtain something like this:

#showing the result only for the first 2 rows
rbind(merge(t1[1,1:4],data.frame(FP=unlist(t1[1,5])),all=T),merge(t1[2,1:4],data.frame(FP=unlist(t1[2,5])),all=T))
#  ID    DO      Z SC FP
#1  1 0.733  2.120  D 10
#2  1 0.733  2.120  D 49
#3  1 0.733  2.120  D 28
#4  2 0.020 -0.966  A 41
#5  2 0.020 -0.966  A  7
#6  2 0.020 -0.966  A 30
#7  2 0.020 -0.966  A  2
#8  2 0.020 -0.966  A 34
#9  2 0.020 -0.966  A 43

Any idea how to do this better?

Thanks!

user6376297
  • 575
  • 2
  • 15
  • The simplest way would be `tidyr::separate_rows(t0,FP, sep = ",")` or check the marked link for more options. – Ronak Shah Jun 14 '19 at 08:24
  • Thank you Ronak! I had not found those answers, as it had not occurred to me that 'split' was what I thought of as 'unmerge'. Should I delete my post? – user6376297 Jun 14 '19 at 08:44
  • No..you can keep the post. It might help future readers who use the same terms as you like "unmerge" etc and land up on this post. It will help them to find relevant post easily then. – Ronak Shah Jun 14 '19 at 09:01

0 Answers0