Is there an R function for setting rows on aggregate data?

Question

The data I am working with is from eBird, and I am looking to sort out species occurrence by both name and year. There are over 30k individual observations, each with its own number of birds. From the raw data I posted below, on Jan 1, 2021 and someone observed 2 Cooper's Hawks, etc.

Raw looks like this:

specificName indivualCount eventDate year
Cooper's Hawk 1 (1/1/2018) 2018
Cooper's Hawk 1 (1/1/2020) 2020
Cooper's Hawk 2 (1/1/2021) 2021

Ideally, I would be able to group all the Cooper's Hawks specificName by the year they were observed and sum the total invidualcounts. That way I can make statistical comparisons between the number of birds observed in 2018, 2019, 2020, & 2021.

I created the separate column for the year
year <- as.POSIXct(ebird.df$eventDate, format = "%m/%d/%Y") ebird.df$year <- as.numeric(format(year, "%Y"))

Then aggregated with the follwing:
aggdata <- aggregate(ebird.df$individualCount , by = list( ebird.df$specificname, ebird.df$year ), FUN = sum)

There are hundreds of bird species, so Cooper's Hawks start on the 115th row so the output looks like this:

Group.1 Group.2 x
115 2018 Cooper's Hawk 86
116 2019 Cooper's Hawk 152
117 2020 Cooper's Hawk 221
118 2021 Cooper's Hawk 116

My question is how to I get the data to into a table that looks like the following:

Species Name 2018 2019 2020 2021
Cooper's Hawk 86 152 221 116

I want to eventually run some basic ecology stats on the data using vegan, but one problem first I guess lol
Thanks!

Have a look at `tidyr::pivot_wider()` https://tidyr.tidyverse.org/reference/pivot_wider.html — Julian, May 13 '22 at 11:40
Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — s__, May 13 '22 at 11:50

G. Grothendieck · Accepted Answer · 2022-05-13T12:22:44.807

There are errors in the data and code in the question so we used the code and reproducible data given in the Note at the end.

Now, using xtabs we get an xtabs table directly from ebird.df like this. No packages are used.

xtabs(individualCount ~ specificName + year, ebird.df)
##                year
## specificName    2018 2020 2021
##   Cooper's Hawk    1    1    2

Optionally convert it to a data.frame:

xtabs(individualCount ~ specificName + year, ebird.df) |> 
  as.data.frame.matrix()
##               2018 2020 2021
## Cooper's Hawk    1    1    2

Although we did not need to use aggdata if you need it for some other reason then it can be computed using aggregate.formula like this:

aggregate(individualCount ~ specificName + year, ebird.df, sum)

Note

Lines <- "specificName,individualCount,eventDate,year
\"Cooper's Hawk\",1,(1/1/2018),2018
\"Cooper's Hawk\",1,(1/1/2020),2020
\"Cooper's Hawk\",2,(1/1/2021),2021"
ebird.df <- read.csv(text = Lines, strip.white = TRUE)

Thanks! I was trying all day to get the data organized without having to manually sort out the raw data! This worked perfectly. I used aggregate because it was the closest I could get to compiling the data in some format. I converted to a data frame using `xtabs/as.data.frame.matrix` suggestion. Thanks so much! — ReverendMachine, May 13 '22 at 19:29

Is there an R function for setting rows on aggregate data?

1 Answers1

Note