0

I'd like to use the aggregate function but then have the output be ordered (smallest to largest) based on 2 columns (first one, and then subset by the other).

Here is an example:

test<-data.frame(c(sample(1:4),1),sample(2001:2005),11:15,c(letters[1:4],'a'),sample(101:105))
names(test)<-c("plot","year","age","spec","biomass")
test
  plot year age spec biomass
1    2 2001  11    a     102
2    4 2005  12    b     101
3    1 2004  13    c     105
4    3 2002  14    d     103
5    1 2003  15    a     104

aggregate(biomass~plot+year,data=test,FUN='sum')

This creates output with just year ordered from smallest to largest.

  plot year biomass
1    2 2001     102
2    3 2002     103
3    1 2003     104
4    1 2004     105
5    4 2005     101

But I'd like the output to be ordered by plot and THEN year.

  plot year biomass
1    1 2003     104
2    1 2004     105
3    2 2001     102
4    3 2002     103
5    4 2005     101

Thanks!!

theforestecologist
  • 4,667
  • 5
  • 54
  • 91
  • Please use `set.seed` before generating random data (as you do with `sample`). – Frank Feb 18 '15 at 21:00
  • 2
    I'm sure you know that switching the two variables' order in `aggregate` does sort as you desire. If you want the columns in that particular order, it is simple to reorder them at the end: `aggregate(biomass~year+plot,data=test,FUN='sum')[,c(2,1,3)]` – Frank Feb 18 '15 at 21:02
  • I understand I can both switch their order in the code and include code to reorder them at the end. However, I need to use this on a dataset with over 50 plots, of 80 years each, so this method is not really practical for that use. thanks. – theforestecologist Feb 18 '15 at 21:05
  • Why can't you sort the data afterwards? – David Arenburg Feb 18 '15 at 21:24
  • I could. But I'm also throwing this into a loop to do this for 100 different species, and I don't want the loop to be more complicated than it has to be. If there is a way to simply incorporate ordering by multiple columns within aggregate, I could really cut down on the length of my code. Thanks. – theforestecologist Feb 18 '15 at 21:35
  • Why do you need to throw it up into a loop? Can you provide an example of what you actually doing? – David Arenburg Feb 18 '15 at 21:43
  • http://stackoverflow.com/questions/20609564/r-specifying-a-desired-row-order-for-the-output-data-frame-of-aggregate does this help? – rmuc8 Feb 18 '15 at 22:02

1 Answers1

0

The aggregate function does sort by columns. Switch the order of the arguments to get your desired sorting:

# switch from
a0 <- aggregate(biomass~plot+year,data=test,FUN='sum')
# to
a  <- aggregate(biomass~year+plot,data=test,FUN='sum')

The data is sorted in the way described in the question. No further sorting is needed.


If you want to change the order in which columns are displayed to exactly match your desired output, try a[,c(1,3,2)]. This reordering is not computationally costly. My understanding is that a data.frame is a list of pointers to column vectors; and this just reorders that list.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • @theforestecologist If there is some complicating factor that makes this infeasible, that factor should be more clearly specified in the original question. To "do this for 100" things is too vague. – Frank Feb 18 '15 at 22:05
  • Maybe use `setorder(setDT(a0), plot, year)`? – David Arenburg Feb 18 '15 at 22:18
  • @DavidArenburg Sure, but why are we sorting it ex-post when simply changing the order of arguments to `aggregate` does the trick? The OP doesn't show an interest in data.tables, but that approach could also be done with data.frames, something like `a0[with(a0,order(plot,year)),]` or something. Yours is a different approach, I think, and it deserves a separate answer. – Frank Feb 18 '15 at 22:36