1

I am willing to calculate the standardized effect size for soil N among 2 soil water levels (optimal, reduced) and two diversity levels (High versus Low) with 5 replicates each. I am using the formula ES = (soil_N optimal - Soil_N reduced) / (soil_N optimal + soil_N reduced), separately for each replicate and diversity level (High and Low) that I can use to plot a graph. Would it be possible to do this calculation without using loops? I am looking forward to a dplyr/tidyverse solution to this problem. I am very much looking forward to your simple answers.

df<- data.frame(Soilwater = c("optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced", 
   "optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced"), 
  Diversity = c("High","High","High","High","High","High","High","High","High","High",   
   "Low", "Low", "Low","Low","Low","Low","Low","Low","Low","Low"),
 Soil_N = c(50,45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, 78, 77, 89, 89, 87, 88, 89), 
 Replicate = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))

For ex., for high diversity, the effect size can be calculated as (50-79)/(50+79) = -0.2248, (45-78)/(45-78) = -0.2682, (49-79)/(49+79) = -0.2343, (48-78)/(48+78) = -0.2381, (49-77)/(49+77) = -0.2222. the average of these 5 replicates is -0.2375. same should be done for low diversity level. I have used the following code but I am getting the value for only 1 replicate and not for all 5 replicates per treatment. Any help would be highly appreciated!

df%>%  
rowwise()%>%    
dplyr::mutate(Replicate = row_number())%>%    
dplyr::group_by(Soilwater, Diversity, Replicate)%>%     
dplyr::summarise(Soil_N = mean(Soil_N))%>% 
tidyr::spread(key = (Soilwater), value = Soil_N)%>%    
dplyr::mutate(ES = (Optimal-Reduced)/(Optimal+Reduced))
Amit
  • 37
  • 5

1 Answers1

0

If I'm understanding your question, I think it's just a case of reordering your functions, with a couple of notes:

  • Your manual calculations seem to be subtracting low-diversity-optimal from high-diversity-optimal (50-79; so grouping by Soilwater) but your question seems to want to group by Diversity? Have proceeded with the second of these, but can be easily changed
  • I used pivot_wider instead of spread - does the same thing, but I find it easier to understand!
  • Your rowwise() %>% mutate(Replicate = row_number()) part was just assigning all rows the Replicate value of 1, so grouping them all as one observation. Don't think that's what you were aiming to do so have taken out.

Edited to deal with missing variables

One option for accounting for NAs is simply to use mean(ES, na.rm = TRUE) in calculating the mean - effectively removing missing rows from calculations:

a)

library(dplyr)
library(tidyr)


df <- tibble(
  Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2),
  Diversity = rep(c("high", "low"), each = 10),
  Soil_N = c(50, 45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, NA_integer_, 77, 89, 89, 87, 88, 89),
  Replicate = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
)

df %>%
  pivot_wider(
    id_cols = c(Diversity, Replicate),
    names_from = Soilwater,
    values_from = Soil_N
  ) %>%
  mutate(ES = (optimal - reduced) / (optimal + reduced)) %>%
  group_by(Diversity) %>%
  summarise(mean_ES = mean(ES, na.rm = TRUE))

#> # A tibble: 2 x 2
#>   Diversity mean_ES
#>   <chr>       <dbl>
#> 1 high      -0.175 
#> 2 low       -0.0615

Another option would be to impute missing values from the mean of the combined Soilwater/Diversity group for each measurement of optimal and reduced.

b)

df %>%
  pivot_wider(
    id_cols = c(Diversity, Replicate),
    names_from = Soilwater,
    values_from = Soil_N
  ) %>%
  group_by(Diversity) %>%
  mutate(across(c(optimal, reduced), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
    ES = (optimal - reduced) / (optimal + reduced)
  ) %>%
  summarise(mean_ES = mean(ES, na.rm = TRUE))

#> # A tibble: 2 x 2
#>   Diversity mean_ES
#>   <chr>       <dbl>
#> 1 high      -0.175 
#> 2 low       -0.0609

Created on 2022-03-17 by the reprex package (v2.0.1)

Andy Baxter
  • 5,833
  • 1
  • 8
  • 22
  • Thanks @Andy! Exactly what I was looking for! Would it also be possible to work with NAs in the formula? Imagine I have only one NA in **optimal** for **low diversity** and then instead of using the individual replicate values in the ES formula, I would like to use the **mean** of (optimal+reduced) in the denominator while keeping individual replicates for numerator **(to avoid reduction in the number of replicates for ES)**. Looking forward to hear from you. Thanks – Amit Mar 17 '22 at 09:53
  • 1
    Hi Amrit! I'm not quite sure how this would work? If `Soil_N` was NA for one individual replicate then that would prevent the calculation of both numerator and denominator for that row? You could a) replace all NA values with the mean for that `Soilwater`/`Diversity` combination before calculating individual `ES` values, or b) use `na.rm = TRUE` in the `mean()` call to get a mean across non-missing values. I can edit the solution to do one of those if that suits? – Andy Baxter Mar 17 '22 at 10:23
  • Option **b)** would be great to know. But if there is only one line of code for option **a)**, I would be really happy to know that too. Everything is coming in steps for me and now I would like to know if we could also include **bootstrap re-sampling with 1000 iterations** while calculating the ES? Thank you very much for your help! Really appreciate! – Amit Mar 17 '22 at 11:34
  • 1
    Hi Amrit, have added both ways above to show what they each do. See if that helps! The short answer to your bootstrapping question is "yes, you can!". The long answer probably needs a fresh question with the code of what you've tried already to achieve bootstrap resampling :D. Stackoverflow works best to do one problem per question, otherwise solutions get a bit messy! Would suggest marking this as answered if it's managed the SE calculation sufficiently and I'll keep an eye out for further qs! – Andy Baxter Mar 17 '22 at 12:05
  • 1
    This was very helpful! Thank you very much! The only modification I needed was that I did not use `summarise` at the end because I wanted to have individual values of all 5 replicates per diversity which I can use to make a dot plot with standard deviation with ggplot. It would be really great to hear your feedback on my next question where I ask about the bootstrap on the same dataset as you are now very familiar with my query. Looking forward to your quick response for that question too. Many thanks! – Amit Mar 17 '22 at 12:37
  • Glad to be able to help! :) – Andy Baxter Mar 17 '22 at 12:39
  • 1
    Thanks! Just asked the follow-up question here [https://stackoverflow.com/questions/71512747/calculating-bootstrap-resampling-for-grouped-variables]. Looking forward to your comment. – Amit Mar 17 '22 at 13:08