2

Been searching around but with no luck so far.

Here is the data frame.

> test = data.frame(x = c(1,1,2,2,3,3), y = c('a','b','c','d','e','f'))
> test
  x y
1 1 a
2 1 b
3 2 c
4 2 d
5 3 e
6 3 f

Was looking for a way to aggregate such that y with identical x value with be formed into a list or vector.

Something like

  x y
1 1 a,b
2 2 c,d
3 3 e,f

Tried 'c' but the result is not what was expected

> aggregate(y~x, data = test, FUN = 'c')
  x y.1 y.2
1 1   1   2
2 2   3   4
3 3   5   6

'list' seems to work, but it converts character to factor, though.

> ss = aggregate(y~x, data = test, FUN = 'list')
> class(ss$y[1][[1]])
[1] "factor"
> ss$y[1]
$`1`
[1] a b
Levels: a b c d e f

Any comments is appreciated, thank you.

user2165
  • 1,951
  • 3
  • 20
  • 39

3 Answers3

6

The column 'y' in the 'test' data is a factor (mentioned by @BondedDust) as the default setting in data.frame call is stringsAsFactors=TRUE. So, it is not converting character to factor.If we are using stringsAsFactors=FALSE while creating the data.frame, the class will be character and will remain as that.

test = data.frame(x = c(1,1,2,2,3,3), y = c('a','b','c','d','e','f'), 
           stringsAsFactors=FALSE)
res <- aggregate(y~x, data = test, FUN = 'list')
str(res)
#'data.frame':  3 obs. of  2 variables:
#$ x: num  1 2 3
# $ y:List of 3
# ..$ 1: chr  "a" "b"
# ..$ 2: chr  "c" "d"
# ..$ 3: chr  "e" "f"

Instead of creating a list, another approach would be to paste the strings together (toString is a wrapper for paste(., collapse=', '))

aggregate(y~x, data = test, FUN = toString)    

Or we can use data.table as an alternate approach. We convert the 'data.frame' to 'data.table' (setDT(test)), grouped by 'x', we list the 'y' element.

library(data.table)
setDT(test)[, list(y=list(y)), by = x]
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Here's one way with base R

res <-lapply(split(test, test$x), function(xx) data.frame(x=unique(xx$x),
   y=paste(xx$y, collapse=", ")))
do.call(rbind, res)
  x    y
1 1 a, b
2 2 c, d
3 3 e, f
Whitebeard
  • 5,945
  • 5
  • 24
  • 31
3

You can use nest from tidyr:

library(tidyr)

nest(test, y)

Source: local data frame [3 x 2]
Groups: <by row>

      x           y
  (dbl)       (chr)
1     1 <S3:factor>
2     2 <S3:factor>
3     3 <S3:factor>

These <S3:factor> are really lists of what you want:

[[1]]
[1] a b
Levels: a b c d e f

[[2]]
[1] c d
Levels: a b c d e f

[[3]]
[1] e f
Levels: a b c d e f
jeremycg
  • 24,657
  • 5
  • 63
  • 74