1

I have a data.frame in R with columns of most types like this:

df <- data.frame(ID = c(1, 2, 3, 4), 
                 Gender = c("Male", "Male", "Female", "Male"),
Average_Score_Test_1 = c(1.2,2.4,3.2,1.8),
Average_Score_Test_2 = c(3.2, 2.8, 1.7, 2.5),
Qualification = c("UG","UG","UG","PG")
)

though with thousands of columns and rows. I have several vectors of the names of groups of columns e.g.

DV_Type1 <- c("Average_Score_Test_1", "Average_Score_Test_2")

and the same for grouping variables

Type1_Group <- c("Gender", "Qualification")

I have then run a nested for loop that runs through the elements of each vector to run significance tests etc...

This runs perfectly for kruskal_test, e.g.

df %>%
  kruskal_test(df[[DV_Type1[1]]] ~ df[[Type1_Group[1]]])

But with exactly the same code but with wilcox_test instead of kruskal_test I get

df %>%
  wilcox_test(df[[DV_Type1[1]]] ~ df[[Type1_Group[1]]])

Error: Can't extract columns that don't exist. The column 'Type1_Group[1]' doesn't exist

Why is this not working?

Using Rstatix in order to get the results in a tibble.

jenjonliz
  • 13
  • 3
  • Welcome to SO. It's easier to help you if you make your question reproducible including data and your code which can be used to test and verify possible solutions. Have a look at https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5 & https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example? – Peter May 21 '20 at 08:13
  • I'm just guessing: Try replacing `df[[DV_Type1[1]]] ~ df[[Type1_Group[1]]]` with `df[DV_Type1[1],] ~ df[Type1_Group[1],]`. – Martin Gal May 21 '20 at 08:56

1 Answers1

0

You are directly passing the variable into the test without any regarding for the environment.. No very sure how that works for rstatix.

I can suggest a solution below that keeps track of your data, but of course, nothing beats the incredible nested for loop.

First we nest the data for each measurement, and I suppose you have a lot of measurements, and the same two independent variable so it should be ok:

library(tidyr)
library(dplyr)
library(broom)
library(purrr)

df_bymeasure = df %>% 
pivot_longer(cols=contains("Average_Score")) %>% nest(data=c(ID,Gender,Qualification,value))

# A tibble: 2 x 2
  name                 data            
  <chr>                <list>          
1 Average_Score_Test_1 <tibble [4 × 4]>
2 Average_Score_Test_2 <tibble [4 × 4]>

Now we can do the test we want:

res = df_bymeasure %>% 
mutate(
kw_Gender = map(data,~ tidy(kruskal.test(value ~ Gender,data=.x))),
wi_Gender = map(data,~ tidy(wilcox.test(value ~ Gender,data=.x))),
kw_Qualification = map(data,~ tidy(kruskal.test(value ~ Qualification,data=.x))),
wi_Qualification = map(data,~ tidy(wilcox.test(value ~ Qualification,data=.x)))
)

Then you can look at one set of results for example kw test for gender:

res %>% unnest(kw_Gender)
# A tibble: 2 x 9
  name  data  statistic p.value parameter method wi_Gender kw_Qualification
  <chr> <lis>     <dbl>   <dbl>     <int> <chr>  <list>    <list>          
1 Aver… <tib…       1.8   0.180         1 Krusk… <tibble … <tibble [1 × 4]>
2 Aver… <tib…       1.8   0.180         1 Krusk… <tibble … <tibble [1 × 4]>
# … with 1 more variable: wi_Qualification <list>
StupidWolf
  • 45,075
  • 17
  • 40
  • 72