0

I have the following dataset

data <- cbind(c(1,1,1,2,3,3,3,4,4,5,5,5,5),
c(1112,1164,1339,395,1297,1440,1944,217,625,561,765,1022,1252))

I would like the result to look like this idealy in two columns rather than one column per value

        [,1] [,2]
 [1,]    1 1112,1164,1339
 [2,]    2  395
 [3,]    3 1297,1440,1944
 [4,]    4  217,625
 [5,]    5  561,765,1022,1252
ulrich
  • 3,547
  • 5
  • 35
  • 49
  • If you state why you want to do this you might get better answers. – John Nov 28 '14 at 08:30
  • @John, what's wrong with the existing answers? :) – talat Nov 28 '14 at 08:33
  • There's nothing wrong with the existing answers if this is exactly what the questioner needs. But more than half the time I get requests like this what was needed is not what was requested (as hinted at by Spaceman's first sentence). Also, it would be good to have it in the record what the reason for such requests is (see the FAQ on good questions). And, it would be more likely to raise the questionners reputation if it was more complete. And... (I think that's enough) – John Nov 28 '14 at 15:45

3 Answers3

2

Since your second column isn't a number you can't have this in a numerical matrix. The numbers will have to be converted to strings and the whole thing stored in a data frame.

> d=data.frame(data)
> d
   X1   X2
1   1 1112
2   1 1164
3   1 1339
4   2  395
5   3 1297
6   3 1440
[etc]

Now we just use dplyr and paste together the X2 values in each X1 category:

> require(dplyr)
> d %>% group_by(X1) %>% summarise(X2=paste(X2,collapse=","))
Source: local data frame [5 x 2]

  X1                X2
1  1    1112,1164,1339
2  2               395
3  3    1297,1440,1944
4  4           217,625
5  5 561,765,1022,1252

Note that what is being held in X2 is a string "1112,1164,1339", so to get the numeric values back out you need to split the string on the comma and convert to numeric.

I would only use this conversion for displaying the data, its not a useful format for further processing.

You can store multiple values in elements of a column, but I've always found it breaks some functions' expectations of what can be in a data frame...

So starting with d you can do:

> dwide = data.frame(X1=unique(d$X1), X2=tapply(d$X2, factor(d$X1),c))
> dwide
  X1                   X2
1  1     1112, 1164, 1339
2  2                  395
3  3     1297, 1440, 1944
4  4             217, 625
5  5 561, 765, 1022, 1252

And then you can access the numerical elements directly, but make sure you get the number of square brackets right:

> dwide$X2[[3]][2]
[1] 1440
Spacedman
  • 92,590
  • 12
  • 140
  • 224
2

Here's a base R option (using a data.frame as input):

aggregate(X2 ~ X1, data, FUN = toString)
#  X1                   X2
#1  1     1112, 1164, 1339
#2  2                  395
#3  3     1297, 1440, 1944
#4  4             217, 625
#5  5 561, 765, 1022, 1252
talat
  • 68,970
  • 21
  • 126
  • 157
2

Or using data.table

 library(data.table)
 setDT(as.data.frame(data))[,toString(V2), by=V1]
 #   V1                   V1
 #1:  1     1112, 1164, 1339
 #2:  2                  395
 #3:  3     1297, 1440, 1944
 #4:  4             217, 625
 #5:  5 561, 765, 1022, 1252
akrun
  • 874,273
  • 37
  • 540
  • 662