Chi-square tests for different groups in a R dataframe

Question

I have a huge dataframe with the following basic structure:

data <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                 trap = c(rep(c("A","B","C","D"), 3)),
                 count=c(6,3,7,9,5,3,6,6,5,8,1,3))
data

I want simultaneously chi-square tests for the species counting data between the four traps for each individually species, but not between them. It could be solved with the following code for each individually species, but because of my huge original dataframe it is not a suitable solution for me.

chi_species1 <- xtabs(count~trap, data, 
                       subset = species=="species1")
chi_species1
chisq.test(chi_species1)

Thanks for your help!!

Yuriy Saraykin · Accepted Answer · 2022-04-25T16:18:15.650

base

df <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                   trap = c(rep(c("A","B","C","D"), 3)),
                   count=c(6,3,7,9,5,3,6,6,5,8,1,3))
df
#>     species trap count
#> 1  species1    A     6
#> 2  species1    B     3
#> 3  species1    C     7
#> 4  species1    D     9
#> 5  species2    A     5
#> 6  species2    B     3
#> 7  species2    C     6
#> 8  species2    D     6
#> 9  species3    A     5
#> 10 species3    B     8
#> 11 species3    C     1
#> 12 species3    D     3

species <- unique(df$species)

chi_species <- lapply(species, function(x) xtabs(count~trap, df, 
                      subset = species== x))

chi_species <- setNames(chi_species, species)

lapply(chi_species, chisq.test)

#> $species1
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 3, df = 3, p-value = 0.3916
#> 
#> 
#> $species2
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 1.2, df = 3, p-value = 0.753
#> 
#> 
#> $species3
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 6.2941, df = 3, p-value = 0.09815

^{Created on 2022-04-25 by the reprex package (v2.0.1)}

tidyverse

df %>% 
  group_by(species, trap) %>% 
  summarise(count = sum(count)) %>% 
  summarise(pvalue= chisq.test(count)$p.value) 

# A tibble: 3 × 2
  species  pvalue
  <fct>     <dbl>
1 species1 0.392 
2 species2 0.753 
3 species3 0.0981

Thank you :-) Can you tell me why the p values with your method are different to them in Quinten's method? — Schneiderhansl, Apr 25 '22 at 15:10
Quinten calculations use `tmp <- df %>% filter(species == "species1")` `chisq.test(x = tmp$count, y = tmp$trap)` I have implemented the description of your example — Yuriy Saraykin, Apr 25 '22 at 15:26
I also added the `tidyverse` calculation code to the example — Yuriy Saraykin, Apr 25 '22 at 15:28

score 0 · Answer 2 · answered Apr 25 '22 at 14:43

You want something like this:

library(dplyr)
data %>% 
  group_by(species) %>% 
  summarise(pvalue= chisq.test(count, trap)$p.value)

Output:

# A tibble: 3 × 2
  species  pvalue
  <fct>     <dbl>
1 species1  0.213
2 species2  0.238
3 species3  0.213

Chi-square tests for different groups in a R dataframe

2 Answers2

Linked