Aggregate multiple values in a single colum r

Question

I have the following dataset

data <- cbind(c(1,1,1,2,3,3,3,4,4,5,5,5,5),
c(1112,1164,1339,395,1297,1440,1944,217,625,561,765,1022,1252))

I would like the result to look like this idealy in two columns rather than one column per value

        [,1] [,2]
 [1,]    1 1112,1164,1339
 [2,]    2  395
 [3,]    3 1297,1440,1944
 [4,]    4  217,625
 [5,]    5  561,765,1022,1252

If you state why you want to do this you might get better answers. — John, Nov 28 '14 at 08:30
There's nothing wrong with the existing answers if this is exactly what the questioner needs. But more than half the time I get requests like this what was needed is not what was requested (as hinted at by Spaceman's first sentence). Also, it would be good to have it in the record what the reason for such requests is (see the FAQ on good questions). And, it would be more likely to raise the questionners reputation if it was more complete. And... (I think that's enough) — John, Nov 28 '14 at 15:45

Spacedman · Accepted Answer · 2014-11-28T08:22:19.567

Since your second column isn't a number you can't have this in a numerical matrix. The numbers will have to be converted to strings and the whole thing stored in a data frame.

> d=data.frame(data)
> d
   X1   X2
1   1 1112
2   1 1164
3   1 1339
4   2  395
5   3 1297
6   3 1440
[etc]

Now we just use dplyr and paste together the X2 values in each X1 category:

> require(dplyr)
> d %>% group_by(X1) %>% summarise(X2=paste(X2,collapse=","))
Source: local data frame [5 x 2]

  X1                X2
1  1    1112,1164,1339
2  2               395
3  3    1297,1440,1944
4  4           217,625
5  5 561,765,1022,1252

Note that what is being held in X2 is a string "1112,1164,1339", so to get the numeric values back out you need to split the string on the comma and convert to numeric.

I would only use this conversion for displaying the data, its not a useful format for further processing.

You can store multiple values in elements of a column, but I've always found it breaks some functions' expectations of what can be in a data frame...

So starting with d you can do:

> dwide = data.frame(X1=unique(d$X1), X2=tapply(d$X2, factor(d$X1),c))
> dwide
  X1                   X2
1  1     1112, 1164, 1339
2  2                  395
3  3     1297, 1440, 1944
4  4             217, 625
5  5 561, 765, 1022, 1252

And then you can access the numerical elements directly, but make sure you get the number of square brackets right:

> dwide$X2[[3]][2]
[1] 1440

score 2 · Answer 2 · answered Nov 28 '14 at 08:23

Here's a base R option (using a data.frame as input):

aggregate(X2 ~ X1, data, FUN = toString)
#  X1                   X2
#1  1     1112, 1164, 1339
#2  2                  395
#3  3     1297, 1440, 1944
#4  4             217, 625
#5  5 561, 765, 1022, 1252

score 2 · Answer 3 · answered Nov 28 '14 at 10:03

Or using data.table

 library(data.table)
 setDT(as.data.frame(data))[,toString(V2), by=V1]
 #   V1                   V1
 #1:  1     1112, 1164, 1339
 #2:  2                  395
 #3:  3     1297, 1440, 1944
 #4:  4             217, 625
 #5:  5 561, 765, 1022, 1252

Aggregate multiple values in a single colum r

3 Answers3

Linked

Related