4

How do I use R to find sums for values by species? I have 62 species names and want to add the basal area for each species in a habitat. I tried

aggregate(file name, species, FUN=sum, simplify=TRUE)

and many variations on this but it always says that it cannot find species (a column header in my data set). The list is too long to add as input; I want the program to use my column information. My data set looks something like this:

Species      BA
sp1          0.5
sp1          0.2
sp2          0.1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Kim
  • 41
  • 2
  • 3
    Even if your entire dataset is too big to include here, providing a minimal [*reproducible* example](http://stackoverflow.com/q/5963269) of your problem will likely elicit the best answers! – BenBarnes Dec 02 '12 at 21:16
  • When you write in your sample code `aggregate(file name...`, you *have* already read the file into R, right? – A5C1D2H2I1M1N2O1R2T1 Dec 04 '12 at 09:45

2 Answers2

5

First read the data into a data frame, then use aggregate.

# You would read from the file instead.
x <- read.table(header=T, file=textConnection("Species      BA
sp1          0.5
sp1          0.2
sp2          0.1
"))

> aggregate(.~Species, data=x, FUN=sum)
  Species  BA
1     sp1 0.7
2     sp2 0.1
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
4

The data.table package is great for this sort of stuff.

First, load the package and create some data.

library(data.table)
set.seed(1)
dat <- data.table(Species = paste("s", sample(1:3, 15, replace = TRUE), sep = ""),
                  BA = rnorm(15), CA = rnorm(15), DA = rnorm(15), EA = rnorm(15),
                  key="Species")
dat
#     Species           BA          CA         DA          EA
#  1:      s1 -0.005767173  0.80418951  1.2383041 -0.79533912
#  2:      s1 -1.147657009 -0.69095384 -0.4527840 -0.17262350
#  3:      s1 -0.891921127 -0.43331032  1.1565370 -0.94064916
#  4:      s1  0.435683299 -0.64947165  0.8320471 -0.11582532
#  5:      s1 -1.237538422  0.72675075 -0.2273287 -0.81496871
#  6:      s2  2.404653389 -0.05710677 -0.2793463 -0.05487747
#  7:      s2  0.763593461  0.50360797  1.7579031  0.25014132
#  8:      s2 -0.411510833 -0.23570656 -1.0655906  0.35872890
#  9:      s2  0.252223448 -0.54288826 -1.5637821 -0.01104548
# 10:      s2  0.377395646  0.99216037 -0.3767027 -1.42509839
# 11:      s3 -0.799009249  1.08576936  0.5607461  0.61824329
# 12:      s3 -0.289461574 -1.28459935 -0.8320433 -2.22390027
# 13:      s3 -0.299215118  0.04672617 -1.1665705 -1.26361438
# 14:      s3 -0.224267885  1.15191175  0.2661374  0.24226348
# 15:      s3  0.133336361 -0.42951311  2.4413646  0.36594112

Note: if you already have a data.frame (which I presume you do) you can just use data.table(YourDataFrame, key=YourGroupingColumns)

Here's the actual aggregation. .SD is the subset of the columns of your data. By default, that's all the columns in your data excluding the keys (your grouping column; in this case, we specified "Species" as our key).

dat[, lapply(.SD, sum), by=key(dat)]
#    Species        BA         CA        DA         EA
# 1:      s1 -2.847200 -0.2427955  2.546776 -2.8394058
# 2:      s2  3.386355  0.6600668 -1.527519 -0.8821511
# 3:      s3 -1.478617  0.5702948  1.269634 -2.2610668

However, there is also an .SDcols argument that lets you specify which columns you're interested in either by name or numeric index.

dat[, lapply(.SD, sum), by=key(dat), .SDcols = "DA"]
#    Species        DA
# 1:      s1  2.546776
# 2:      s2 -1.527519
# 3:      s3  1.269634
dat[, lapply(.SD, sum), by=key(dat), .SDcols = 2:3]
#    Species        BA         CA
# 1:      s1 -2.847200 -0.2427955
# 2:      s2  3.386355  0.6600668
# 3:      s3 -1.478617  0.5702948
# or perhaps, more easily understood.
dat[, lapply(.SD, sum), by=key(dat), .SDcols = c('BA','CA')]
#    Species        BA         CA
# 1:      s1 -2.847200 -0.2427955
# 2:      s2  3.386355  0.6600668
# 3:      s3 -1.478617  0.5702948
mnel
  • 113,303
  • 27
  • 265
  • 254
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485