2

I want to use Wilcoxon 2-sided test for two treatments across multiple groups, i.e. there is a before and after treatment (Conc) for each of several sample sites. I want to split the dataset into a list by Site then apply the test so i can have an output for each Site individually, however, i am having trouble setting this up as a function that can repeat.

I have a number of sites (Site) and two levels of treatment (Scenario), with resulting scores (Conc):

'data.frame':   7344 obs. of  6 variables:
 $ Site        : chr  "A" "B" "C" "D" ...
 $ Scenario    : chr  "1" "1" "1" "1" "2" "2" "2" "2" ...
 $ Conc        : num  4.7727 0.055 0.0552 0.055 0.055 ...

there are multiple Conc data points (~60) within each Site/Scenario combination. The reason i chose a Wilcoxon test is primarily because i have slightly uneven sample numbers between treatments (Scenario) for each Site.

When i use this code for the entire dataset i get a sensible result:

t1 <- wilcox.test(Conc ~ Scenario, data = data.frame)
t1

However, this code doesn't apply the test for each site individually.

I have looked looked at all similar examples i could find (on SO and elsewhere) and this is the best code i could come up with:

t2 = data.frame %>% group_by(Site) %>% do(tidy(wilcox.test(Conc~Scenario, data=data.frame), na.rm=TRUE, equal.var=FALSE))
t2

this code is giving me an output for each site but all test outputs are the same, even the p value:

# A tibble: 107 x 5
# Groups:   Site [107]
   Site     statistic p.value method                                      alternative
   <chr>       <dbl>   <dbl> <chr>                                             <chr>      
 1 A         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  
 2 B         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  
 3 C         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  
 4 D         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  
 5 E         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  
 6 F         6145702   0.690 Wilcoxon rank sum test with continuity correction two.sided  

Can anyone see what I'm doing wrong? thanks for your help

CatN
  • 71
  • 7
  • 2
    You could try `lapply(split(data.frame, data.frame$Site), function(x) wilcox.test(Conc ~ Scenario, data = x))` to get a list of Wilcox tests across all your sites – Allan Cameron Aug 19 '20 at 12:43

1 Answers1

4

EDITED 21/08/2020 to more closely mirror your data

Here's a solution with dplyr and purrr EDITED to include broom::tidy results...

# 'data.frame': 5626 obs. of 3 variables: 
# $ Site.Year: Factor w/ 3 levels "Baffle Creek at Newton Road_2018_2019",..: 1 1 1 1 1 1 1 1 1 1 ... 
# $ Scenario : chr "FF_Total" "FF_Total" "FF_Total" "FF_Total" ... 
# $ PAF : num 4.77 4.77 4.77 4.77 4.77

set.seed(2020)

Site.Year <- rep(c("Baffle Creek at Newton Road_2018_2019", 
                   "Baffle Creek at Newton Road_2017_2018", 
                   "Baffle Creek at Newton Road_2019_2020"), 50)
Scenario <- rep_len(c(rep("FF_Total", 4), rep("Not_FF_Total", 4)), 150)
PAF <- rnorm(150, mean = 2.5, sd = 1)

DailyPAF_long <- data.frame(Site.Year, Scenario, PAF)

DailyPAF_long$Site.Year <- factor(DailyPAF_long$Site.Year)
# str(DailyPAF_long)
# wilcox.test(PAF ~ Scenario, data = DailyPAF_long)

library(dplyr)
library(purrr)

DailyPAF_long %>% 
  base::split(Site.Year) %>% 
  purrr::map(~ wilcox.test(PAF ~ Scenario, data = .)) %>% 
  purrr::map_dfr(~ broom::tidy(.)) 

#> # A tibble: 3 x 4
#>   statistic p.value method                       alternative
#>       <dbl>   <dbl> <chr>                        <chr>      
#> 1       361  0.355  Wilcoxon rank sum exact test two.sided  
#> 2       219  0.0723 Wilcoxon rank sum exact test two.sided  
#> 3       380  0.195  Wilcoxon rank sum exact test two.sided
Chuck P
  • 3,862
  • 3
  • 9
  • 20
  • hi Chuck thanks so much for your answer! I am trying your code but I keep getting the error object 'Site' not found... do you think I need to convert any of the variables to character? 'data.frame': 5626 obs. of 3 variables: $ Site: Factor w/ 3 levels "A",..: 1 1 1 1 1 1 1 1 1 1 ... $ Scenario : Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 1 ... $ Score : num 4.77 4.77 4.77 4.77 4.77 ... – CatN Aug 20 '20 at 01:42
  • I've rebooted R out of desperation and it still doesn't work.. – CatN Aug 20 '20 at 01:48
  • No factors will work. May I see the exact command you're running please? It is very likely simply a typo in the command from the error message generated. – Chuck P Aug 20 '20 at 13:36
  • 'data.frame': 5626 obs. of 3 variables: $ Site.Year: Factor w/ 3 levels "Baffle Creek at Newton Road_2018_2019",..: 1 1 1 1 1 1 1 1 1 1 ... $ Scenario : chr "FF_Total" "FF_Total" "FF_Total" "FF_Total" ... $ PAF : num 4.77 4.77 4.77 4.77 4.77 ... – CatN Aug 21 '20 at 06:27
  • DailyPAF_long %>% split(Site.Year) %>% map(~ wilcox.test(PAF ~ Scenario, data = .)) %>% map_dfr(~ broom::tidy(.)) – CatN Aug 21 '20 at 06:27
  • 1
    Thank you. I'm going to edit my answer to more closely mirror your exact situation. **BUT** the most likely culprit is the fact the error message says *"object 'Site' not found..."*. Since the variable is actually `Site.Year` it is strong evidence that when you grabbed my solution you failed to make that change... – Chuck P Aug 21 '20 at 14:03
  • fantastic, thanks Chuck! I also added df$ infront of each variable which seemed to help... – CatN Aug 24 '20 at 01:33
  • 1
    A year+ later and today I would use: 'DailyPAF_long %>% group_by(Site.Year) %>% group_map(~ wilcox.test(PAF ~ Scenario, data = .)) %>% set_names(nm = DailyPAF_long %>% group_by(Site.Year) %>% group_keys() %>% pull()) %>% purrr::imap_dfr(~ broom::tidy(.x), .id = "Site.Year")' – Chuck P Oct 07 '22 at 18:53