-1

I have a data set with millions of rows. The first row has an ID, though there are repeated IDs in the data set (all IDs are grouped and ordered). The data set has multiple columns. I would like to transform the data such that there is one row item per ID, and all the previous entries of the columns for the ID are put into a single row, in order.

See example snippet of the before data

What the data currently looks like

And example of what I would like the data to look like What the entries should look like

Here is an example of a very similar problem, however in this problem the data only has two columns (one column for the ID), but my data has over 5 columns (and one column for ID): Collapse mutiple rows of a dataframe into one row - based on a unique key

I would like to do this in either R or Excel :)

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
user5211911
  • 51
  • 1
  • 11

1 Answers1

0

In R, we can do this with dcast from data.table

library(data.table)
dcast(setDT(df1), ID ~ rowid(ID), value.var = c("V1", "V2"), fill = "")
#   ID V1_1 V1_2 V1_3 V2_1 V2_2 V2_3
#1:  1    a    b    c   aa   bb   cc
#2:  2    d    e        dd   ee     
#3:  3    f             ff          

data

df1 <- structure(list(ID = c(1, 1, 1, 2, 2, 3), V1 = c("a", "b", "c", 
 "d", "e", "f"), V2 = c("aa", "bb", "cc", "dd", "ee", "ff")), .Names = c("ID", 
"V1", "V2"), row.names = c(NA, -6L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, this worked very well. However, when I tried to run it again the very next day with the same code (I'd saved the code the night before), I got an error which said:Error in .subset2(x, i, exact = exact) : subscript out of bounds In addition: Warning message: In if (!(value.var %in% names(data))) { : the condition has length > 1 and only the first element will be used – user5211911 Jun 13 '17 at 15:36
  • @user5211911 Can you try it based on the data on by post – akrun Jun 13 '17 at 18:40
  • Yes, it does so even when I use the data from your post. I copy pasted the data from your post into R but it did not work and gave me the same error. Though this is strange, as it worked the first time I tried it but now it doesn't work – user5211911 Jun 14 '17 at 16:06
  • 2
    Okay I just turned my computer on and off again and the code worked now...quite strange – user5211911 Jun 14 '17 at 16:25