Question about grouping time series by two different columns

Question

I am working with NDVI data from 60 pixels from the past 36 years. I have multiple NDVI values per year, but I am attempting to calculate community stability using the codyn package. However, the community_stability function requires there to be one value per time variation (i.e., year) (otherwise, it will sum all the NDVI values for that year per site). So, I need to group by pixel(site) and by year to calculate an average per year. I am having difficulty figuring out how to group two different factors, though. Here's a snapshot of my dataframe layout:

       Date year month_num     Season        site  NDVI            site_season
1      5309 1984        07 Transition   M1CAH1SUR 0.317   M1CAH1SUR_Transition
2      5405 1984        10        Dry   M1CAH1SUR 0.208          M1CAH1SUR_Dry
3      5613 1985        05 Transition   M1CAH1SUR 0.480   M1CAH1SUR_Transition
4      5677 1985        07 Transition   M1CAH1SUR 0.316   M1CAH1SUR_Transition
5      5693 1985        08        Dry   M1CAH1SUR 0.315          M1CAH1SUR_Dry

...

Can anyone help me with the code to group by year per site to calculate the NDVI values for each year in each of the respective sites? Any help will be greatly appreciated!

I tried using dplyr as follows:

NDVIplot_long %>%
+     group_by(site, year, add = TRUE) %>%
+     summarize(mean_NDVI = mean(NDVI, na.rm = TRUE))

but it only returns one value.

NDVIplot_long %>%
+     group_by(site, year, add = TRUE) %>%
+     summarize(mean_NDVI = mean(NDVI, na.rm = TRUE))

  mean_NDVI
1 0.2825419

I expect to have a value for years 1984, 1985, 1986, etc. for all 60 sites. Instead, only one value was returned.

akrun · Answer 1 · 2019-07-24T13:47:35.253

The issue would be related to plyr::summarise loaded as well which masks the same function from dplyr. We can specify dplyr::summarise

library(dplyr)
NDVIplot_long %>%
  group_by(site, year, add = TRUE) %>%
  dplyr::summarize(mean_NDVI = mean(NDVI, na.rm = TRUE))
# A tibble: 2 x 3
# Groups:   site [1]
#  site       year mean_NDVI
#  <chr>     <int>     <dbl>
#1 M1CAH1SUR  1984     0.262
#2 M1CAH1SUR  1985     0.370

The single mean output is reproducible as well(though the numbers are different - could be the OP used the full dataset)

NDVIplot_long %>%
   group_by(site, year, add = TRUE) %>%
   plyr::summarize(mean_NDVI = mean(NDVI, na.rm = TRUE))
#  mean_NDVI
#1    0.3272

data

NDVIplot_long <- structure(list(Date = c(5309L, 5405L, 5613L, 5677L, 
         5693L), year = c(1984L, 
1984L, 1985L, 1985L, 1985L), month_num = c(7L, 10L, 5L, 7L, 8L
), Season = c("Transition", "Dry", "Transition", "Transition", 
"Dry"), site = c("M1CAH1SUR", "M1CAH1SUR", "M1CAH1SUR", "M1CAH1SUR",   
"M1CAH1SUR"), NDVI = c(0.317, 0.208, 0.48, 0.316, 0.315),
   site_season = c("M1CAH1SUR_Transition", 
"M1CAH1SUR_Dry", "M1CAH1SUR_Transition", "M1CAH1SUR_Transition", 
"M1CAH1SUR_Dry")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5"))

Question about grouping time series by two different columns

1 Answers1

data