0

I have a phenology 'data' composed of 310890 rows, and 5 columns named 'Species', 'Observation_Year', 'Observation_Day', 'Latitude' and 'Longitude'.

str(data)

'data.frame':   310890 obs. of  5 variables:

$ Species           : chr  "Acer_campestre" "Acer_campestre" "Acer_campestre" "Acer_campestre" ...
 
$ Observation_Year  : int  1999 1999 2000 2000 1999 1999 1999 1999 1954 1955 ...
 
$ Observation_Day   : int  116 118 80 121 91 84 119 123 131 132 ...

$ Latitude          : num  52.1 52.4 52.6 52.6 53 ...

$ Longitude         : num  0.323 -1.573 -1.026 -1.128 -2.207 ...

In total, there are 403 species. The Observation_Year starts in 1952 and ends in 2019. For each species, I have multiple Observation_Day values recorded in different Observation_Year and in different sites (Latitude and Longitude). The Observation Years are not consecutive, so there are gaps in the time series.

I would like to create a new data frame composed of 68 rows (consecutive Years from 1952 to 2019) and 404 columns, where the first column is the Year, and the other 403 columns are the Average of the Observation Day for each year and for each species.

e.g., New data frame

Year    Acer_campestre  Acer_platanoides    Acer_pseudoplatanus …   
1952    124             114                 122     
1953    101             109                 NA      
1954    131             NA                  114     
1955    132             109                 NA      
1956    NA              NA                  NA      
  …     …   …   …       
2019    NA              180                 200     

I tried the following code but I got the mean only for the last species, instead, I want a new data.frame with the time series of each species in columns starting from 1952 to 2019.

species <- unique(data$Species)

length(unique(species)) # 403 Different species

library(dplyr)

for(i in 1:length(species)) {
data2 <- data
data3 <- data[which(data$Species == species[i]),] %>%
group_by(Observation_Year) %>%
summarise(Mean = mean(Observation_Day, na.rm=TRUE))
assign(paste0("SubsetData",i), data3)}

Thank you for any help

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • 3
    I guess you need `data %>% group_by(Observation_Year, Species) %>% summarise(Mean = mean(Observation_Day, na.rm = TRUE))` instead of loping – akrun Apr 26 '21 at 22:49
  • Thank you for your answer. However, what I got is a data.frame composed of 15642 rows and three columns. The first column is the Oservation_year that are consecutive for each species, then the name of the Species, and then the Mean for that year and that species. Instead, I would like to have a matrix or a data.frame with 68 rows only (years from 1952 to 2019) and 403 columns that are the Means of the Observation_Day for each species. – Alma Piermattei Apr 28 '21 at 15:54
  • I found the solution: `data.TS <- data %>% group_by(Observation_Year, Species_Latin_Name) %>% summarise(Mean = mean(Observation_Day, na.rm = TRUE)) %>% arrange(Species)` `library(reshape2)` ` data.TS.years <- dcast(data.TS, Observation_Year ~ paste("Mean", Species,sep="."), value.var="Mean")` Thank you again! – Alma Piermattei Apr 28 '21 at 16:44

0 Answers0