1

I am a bit confused here and not able to find a good answer.

I have a dataframe that I am trying to aggregate:

dt <- data.frame(age=rchisq(20,10),group=sample(1:2,20,rep=T))

When I aggregate this dataframe and save it to a new dataframe it only saves 2 observations and 2 variables to the global environment:

ag<-aggregate(age ~ group, dt, function(x) c(mean = mean(x), sd = sd(x)))
    
    group   age
1   1   9.119008
2   2   9.740361

Namely the columns group and age. When I perform this action directly in the console it prints three columns namely group age.mean and age.sd as supposed to:

aggregate(age ~ group, dt, function(x) c(mean = mean(x), sd = sd(x)))

  group age.mean   age.sd
1     1 9.119008 3.611732
2     2 9.740361 4.163281

Even when printing the global environment dataframe to the console with ag it prints all three columns. Why does this third column not show up in the global environment? How can I get it there?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Rivered
  • 741
  • 7
  • 27

3 Answers3

1

It works just fine on my console :

    > dt <- data.frame(age=rchisq(20,10),group=sample(1:2,20,rep=T))
> ag<-aggregate(age ~ group, dt, function(x) c(mean = mean(x), sd = sd(x)))
> ag
  group  age.mean    age.sd
1     1 11.176997  4.439366
2     2 11.374782  4.416337
> aggregate(age ~ group, dt, function(x) c(mean = mean(x), sd = sd(x)))
  group  age.mean    age.sd
1     1 11.176997  4.439366
2     2 11.374782  4.416337
TheDuud
  • 11
  • 1
1

Your problem is that aggregate results in matrix columns e.g. when applying multiple FUN=ctions. You need to additionally wrap a data.frame method around it, that's all.

ag1 <- aggregate(age ~ group, dt, function(x) c(mean=mean(x), sd=sd(x)))
str(ag1)
# 'data.frame': 2 obs. of  2 variables:
#  $ group: int  1 2
#  $ age  : num [1:2, 1:2] 9.06 11 3.28 4.8
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : NULL
#   .. ..$ : chr [1:2] "mean" "sd"

Make data frame:

res <- do.call(data.frame, ag1)
res
#   group  age.mean   age.sd
# 1     1  9.061935 3.283173
# 2     2 10.998478 4.798354

str(res)
# 'data.frame': 2 obs. of  3 variables:
#  $ group   : int  1 2
#  $ age.mean: num  9.06 11
#  $ age.sd  : num  3.28 4.8

All in one:

res <- do.call(data.frame, aggregate(age ~ group, dt, function(x)
  c(mean=mean(x), sd=sd(x)))

Data:

dt <- data.frame(age=rchisq(20,10),group=sample(1:2,20,rep=T))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

I cannot comment here because of a low rating, so I post my comment as an answer.

In addition to jay.sf answer, in this post there is a detailed explanation of such behavior of aggregate.

andrein
  • 71
  • 3