3

I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using

length(unique(df$ID))==nrow(df)

n_occur<-data.frame(table(df$ID))

I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.

So for example if I had

x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a

I would like to end up with

x y ID x2 y2 ID2 x3 y3 ID3
1 2 a  1  4  a   2  3  a
1 3 b  3  2  b  na na na
2 4 c  na na na na na na
1 3 d  na na na na na na

I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
suuz
  • 43
  • 1
  • 7
  • I think you can use the merge function for this. – johannes_lalala Feb 03 '16 at 16:51
  • 1
    Also look at [here](http://stackoverflow.com/questions/11608167/cast-multiple-value-columns) or [here](http://stackoverflow.com/questions/26019915/how-to-spread-or-cast-multiple-values-in-r) – akrun Feb 03 '16 at 17:09
  • @akrun I reopened because the original was not a good one imo. The 2nd link you provided is much better. Marked again as duplicate now. – Jaap Feb 03 '16 at 17:27
  • Also other dupe links are [here](http://stackoverflow.com/questions/23056328/can-the-value-var-in-dcast-be-a-list-or-have-multiple-value-variables) or [here](http://stackoverflow.com/questions/27247078/reshape-multiple-values-at-once-in-r) or [here](http://stackoverflow.com/questions/30314356/dcast-with-empty-rhs) – akrun Feb 03 '16 at 17:29

1 Answers1

4

You could use the enhanced dcast function from the package for that where you can select multiple value variables. With setDT(mydf) you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID] you add a index by ID which you use subsequently in the dcast formula:

library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))

Or with the development version of data.table (v1.9.7+), you can use the new rowid function:

dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))

gives:

   ID x_1 x_2 x_3 y_1 y_2 y_3
1:  a   1   1   2   2   4   3
2:  b   1   3  NA   3   2  NA
3:  c   2  NA  NA   4  NA  NA
4:  d   1  NA  NA   3  NA  NA

Used data:

mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L, 
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x", 
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))
Jaap
  • 81,064
  • 34
  • 182
  • 193