1

So initially I had the following object:

> head(gs)
  year disturbance lek_id  complex tot_male
1 2006           N     3T  Diamond        3
2 2007           N     3T  Diamond       17
3 1981           N   bare 3corners        4
4 1982           N   bare 3corners        7
5 1983           N   bare 3corners        2
6 1985           N   bare 3corners        5

With that I computed general statistics min, max, mean, and sd of tot_male for year within complex. I used R data splitting functions, and assigned logical column names where it seemed appropriate and ultimately made them different objects.

> tyc_min = aggregate(gs$tot_male, by=list(gs$year, gs$complex), FUN=min)
> names(tyc_min) = c("year", "complex", "tot_male_min")
> tyc_max = aggregate(gs$tot_male, by=list(gs$year, gs$complex), FUN=max)
> names(tyc_max) = c("year", "complex", "tot_male_max")
> tyc_mean = aggregate(gs$tot_male, by=list(gs$year, gs$complex), FUN=mean)
> names(tyc_mean) = c("year", "complex", "tot_male_mean")
> tyc_sd = aggregate(gs$tot_male, by=list(gs$year, gs$complex), FUN=sd)
> names(tyc_sd) = c("year", "complex", "tot_male_sd")

Example Output (2nd Object - Tyc_max):

year  complex tot_male_max
1 2003                     0
2 1970 3corners           26
3 1971 3corners           22
4 1972 3corners           26
5 1973 3corners           32
6 1974 3corners           18

Now I need to add the number of samples per year/complex combination as well. Then I need to merge these into single data object, and export as a .csv file

I know I need to use merge() function along with all.y but have no idea how to handle this error:

Error in fix.by(by.x, x) : 
  'by' must specify one or more columns as numbers, names or logical

Or.. add the number of samples per year and complex. Any suggestions?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129

2 Answers2

1

This might work (but hard to check without a reproducible example):

gsnew <- Reduce(function(...) merge(..., all = TRUE, by = c("year","complex")), 
                list(tyc_min, tyc_max, tyc_mean, tyc_sd))

But instead of aggregating for the separate statistics and then merging, you can also aggregate everything at once into a new dataframe / datatable with for example data.table, dplyr or base R. Then you don't have to merge afterwards (for a base R solution see the other answer):

library(data.table)
gsnew <- setDT(gs)[, .(male_min = min(tot_male),
                       male_max = max(tot_male),
                       male_mean = mean(tot_male),
                       male_sd = sd(tot_male), by = .(year, complex)]

library(dplyr)
gsnew <- gs %>% group_by(year, complex) %>%
  summarise(male_min = min(tot_male),
            male_max = max(tot_male),
            male_mean = mean(tot_male),
            male_sd = sd(tot_male))
Community
  • 1
  • 1
Jaap
  • 81,064
  • 34
  • 182
  • 193
1
mystat <- function(x) c(mi=min(x), ma=max(x))
aggregate(Sepal.Length~Species, FUN=mystat, data=iris)

for you:

mystat <- function(x) c(mi=min(x), ma=max(x), m=mean(x), s=sd(x), l=length(x))
aggregate(tot_male~year+complex, FUN=mystat, data=gs)
jogo
  • 12,469
  • 11
  • 37
  • 42