0

Newbie here.

I have a dataset with columns "YEAR" (2014-2019), "SITE" (7 SITES), "TRANSECT" (UPSTREAM,DOWNSTREAM), and about 50 Insect species columns containing counts of individuals. I want to average the upstream and downstream samples for each year and site. The end goal is a dataset with columns "YEAR", "SITE", and the 50 Insect species columns containing the mean of the upstream and downstream counts. I have tried several methods to do this but have been unsuccessful. The following code is the last thing I have tried.

INS_YxS<-aggregate(INV.MEANS[5:54], INV.MEANS[1:3], mean)

Columns 1-4 in this dataset are X, YEAR, SITE, TRANSECT. 5-54 are Insect Species.

The resulting dataset appeared to have the correct columns but it looks like it just removed the TRANSECT column without averaging the upstream and downstream species counts... Anyone know how to accomplish what I am trying to do?

Here is a visual representation of what my data looks like (table 1) and what I want it to look like (table 2): https://i.stack.imgur.com/WkX4e.png

Notice that in 2 there is no TRANSECT column and that the new values in the insect columns are the means of the UPSTREAM and DOWNSTREAM TRANSECT rows for each YEAR SITE resulting in fewer rows.

Apologies, I am trying to find the best way to explain what I want to do...

I know the answer is out there and depends on me asking the correct question...

Thank you!!!

  • If you share your data (or a simulated version of it) it will be easier to help you. See [providing a minimal dataset](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for help with that if you need it. – Andrea M Mar 20 '22 at 21:52

1 Answers1

0

Consider the formula version for aggregate with dot notation:

INSECTS_MEANS <- aggregate(
    . ~ YEAR + SITE + TRANSECT,
    data=INSECTS_COUNTS, 
    FUN=mean, na.rm=TRUE, 
    na.action=na.omit
)

Otherwise you need to pass lists into by argument:

INSECTS_MEANS <- aggregate(
    x = INSECTS_COUNTS[5:ncol(INSECTS_COUNTS)],
    by = list(
        YEAR = INSECTS_COUNTS$YEAR,
        SITE = INSECTS_COUNTS$SITE,
        TRANSECT = INSECTS_COUNTS$TRANSECT
    ), 
    FUN=mean, na.rm=TRUE, 
    na.action=na.omit
)
Parfait
  • 104,375
  • 17
  • 94
  • 125