0

I have a dataframe of protein tests, where the challenge is the known concentration and the result_ug is the analytical result from a candidate instrument.

library(tidyverse)
library(broom)
library(rstatix)

# minimal dataset
protein.df <- structure(list(challenge = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L), .Label = c("0", "2.5", "5", "10"), class = "factor"), 
result_ug = c(0.612,  0.392, 0, 6.949, 4.027, 5.41, 8.328, 6.402, 10.717, 20.416, 
    17.03, 20.863)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

protein.df

# A tibble: 12 × 2
   challenge result_ug
   <fct>         <dbl>
 1 0             0.612
 2 0             0.392
 3 0             0    
 4 2.5           6.95 
 5 2.5           4.03 
 6 2.5           5.41 
 7 5             8.33 
 8 5             6.40 
 9 5            10.7  
10 10           20.4  
11 10           17.0  
12 10           20.9  

I want to conduct t tests of the challenge concentration (challenge) against the actual results from the new method (result_ug). So I need to loop through each of my challenge concentrations, extract my data corresponding to that subset of concentrations, and perform a one sample t test. This is how I would do it with a for loop:

# non tidyverse method of analysis 
for(i in c(0,2.5,5,10)) { 
  x<-protein.df %>% filter(challenge==i) %>% select(result_ug)
  print(t.test(x,mu=i))
}

which gives the following results:

challenge estimate p.value
--------- -------- -------
0            0.335 0.2024
2.5          5.46  0.07246
5            8.48  0.108
10          19.4   0.01605

However when I try to do as a tidyverse pipe to avoid using the for loop...

# this code based on 
# https://stackoverflow.com/questions/51074328/perform-several-t-tests-simultaneously-on-tidy-data-in-r
protein.df %>%
  group_by(challenge) %>%
  summarise(ttest = list(t.test(result_ug, mu = min(unique(as.numeric(challenge)))))) %>%
  mutate(ttest = map(ttest, tidy)) %>%
    unnest(cols=c(ttest)) %>%
     select(challenge, estimate, p.value)

I get:

challenge estimate p.value
--------- -------- -------
0            0.335 0.0654 
2.5          5.46  0.0546 
5            8.48  0.0481 
10          19.4   0.00609

The p.values are different in the second set. My suspicion is that the 'challenge' figure is not being passed correctly but I am struggling with how to do this. Any help appreciated with where I am going wrong ?

PJP
  • 612
  • 1
  • 6
  • 18
  • Use `mu` as `mu = first(as.numeric(as.character(challenge)))))`. Read https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information – Ronak Shah Jan 13 '22 at 11:18
  • @RonakShah - that's perfect - can you explain why that works? – PJP Jan 13 '22 at 12:01

0 Answers0