I have a phenology 'data' composed of 310890 rows, and 5 columns named 'Species', 'Observation_Year', 'Observation_Day', 'Latitude' and 'Longitude'.
str(data)
'data.frame': 310890 obs. of 5 variables:
$ Species : chr "Acer_campestre" "Acer_campestre" "Acer_campestre" "Acer_campestre" ...
$ Observation_Year : int 1999 1999 2000 2000 1999 1999 1999 1999 1954 1955 ...
$ Observation_Day : int 116 118 80 121 91 84 119 123 131 132 ...
$ Latitude : num 52.1 52.4 52.6 52.6 53 ...
$ Longitude : num 0.323 -1.573 -1.026 -1.128 -2.207 ...
In total, there are 403 species.
The Observation_Year
starts in 1952 and ends in 2019.
For each species, I have multiple Observation_Day
values recorded in different Observation_Year
and in different sites (Latitude and Longitude).
The Observation Years are not consecutive, so there are gaps in the time series.
I would like to create a new data frame composed of 68 rows (consecutive Years from 1952 to 2019) and 404 columns, where the first column is the Year, and the other 403 columns are the Average of the Observation Day for each year and for each species.
e.g., New data frame
Year Acer_campestre Acer_platanoides Acer_pseudoplatanus …
1952 124 114 122
1953 101 109 NA
1954 131 NA 114
1955 132 109 NA
1956 NA NA NA
… … … …
2019 NA 180 200
I tried the following code but I got the mean only for the last species, instead, I want a new data.frame with the time series of each species in columns starting from 1952 to 2019.
species <- unique(data$Species)
length(unique(species)) # 403 Different species
library(dplyr)
for(i in 1:length(species)) {
data2 <- data
data3 <- data[which(data$Species == species[i]),] %>%
group_by(Observation_Year) %>%
summarise(Mean = mean(Observation_Day, na.rm=TRUE))
assign(paste0("SubsetData",i), data3)}
Thank you for any help