3

I aggregate() the value column sums per site levels of the R data.frame given below:

set.seed(2013)
df <- data.frame(site = sample(c("A","B","C"), 10, replace = TRUE),
                 currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),10, replace=TRUE, prob=c(10,6,5,6,0.5)),
                 value = sample(seq(1:10)/10,10,replace=FALSE))

df.site.sums <- aggregate(value ~ site, data=df, FUN=sum)
df.site.sums

#  site value
#1    A   0.2
#2    B   0.6
#3    C   4.7

However, I would like to be able to specify the row order of the resulting df.site.sums. For instance like:

reorder <- c("C","B","A")
?special_sort(df, BY=site, ORDER=reorder) # imaginary function
#  site value
#1    C   4.7
#2    B   0.6
#3    A   0.2

How can I do this using base R? Just to be clear, this is essentially a data frame row ordering question where the context is the aggregate() function (which may or may not matter).

This is relevant but does not directly address my issue, or I am missing the crux of the solution.


UPDATE

For future reference, I found a solution to ordering a data.frame's rows with respect to a target vector on this link. I guess it can be applied as a post-processing step.

df.site.sums[match(reorder,df.site.sums$site),]
Community
  • 1
  • 1
Zhubarb
  • 11,432
  • 18
  • 75
  • 114

1 Answers1

3

This may be a possibility: convert 'site' to a factor and specify the order in levels.

df$site2 <- factor(df$site, levels = c("C", "B", "A"))
aggregate(value ~ site2, data = df, FUN = sum)

#   site2 value
# 1     C   4.7
# 2     B   0.6
# 3     A   0.2

Update following @Ananda Mahto's comment (thanks!). You can use the 'non-formula' approach of aggregate:

reorder <- c("C", "B", "A")
with(df, aggregate(x = list(value = value),
                   by = list(site = factor(site, levels = reorder)),
                   FUN = sum))
#   site value
# 1    C   4.7
# 2    B   0.6
# 3    A   0.2

Or, converting to factor within the formula interface, and rename the converted site column:

df2 <- aggregate(value ~ factor(site, levels = c("C", "B", "A")),
                 data = df, FUN = sum)
df2
names(df2) <- c("site", "value")
df2
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Thanks Henrik, is there an alternative to specify the order after or during the aggregate()? – Zhubarb Dec 16 '13 at 11:52
  • I see that you already have updated your question with an 'after alternative'. I was going to point you to something similar. Right now I don't know about a clean 'during' alternative. – Henrik Dec 16 '13 at 12:11
  • 2
    @Zhubarb, there is also the *non-formula* approach to `aggregate`. `with(df, aggregate(list(value = value), list(site = factor(site, reorder)), FUN=sum))`. Henrik, you can also use `factor` in the formula version, but the resulting column names are funky, so renaming would be required. – A5C1D2H2I1M1N2O1R2T1 Dec 16 '13 at 13:22
  • @AnandaMahto, Thanks a lot for suggesting the non-formula approach. i add it to the answer. Cheers. – Henrik Dec 16 '13 at 13:27
  • @AnandaMahto, yes, I tried using `factor` in the formula version before I posted my first answer, but skipped it due to the funkiness of the names. I haven't tried specifying `x` as a `list` before - useful indeed to be able to set the names of the results column right away. Cheers. – Henrik Dec 16 '13 at 13:39
  • @AnandaMahto, Thanks a lot that solves it! (And thanks Henrik for updating the accepted answer) – Zhubarb Dec 16 '13 at 14:15