I have a data frame with the categories fruits, ripeness, and mean.
How can I create a for loop that runs a ttest to determine the mean difference for the ripeness for EACH fruit? In other words, for apples, the ttest would produce a result of the mean difference between ripe and unripe apples.
An example of this would look like the following table.
2 Answers
Something like this could work for returning p-values of the t-test comparing "Ripeness" as you loop through the unique "Fruits" that appear in your data.
## create a vector of the unique fruit in the data; vector of fruit to be tested
fruit<-unique(data$Fruits)
## iterate through your list of unique fruit, testing as you go
for(i in 1:length(fruit)){
## subset your data to include only the current fruit to be tested
df<-filter(data, Fruits==fruit[i])
## let the user know which fruit is being tested
message(fruit[i])
## create a vector of the unique ripeness states of the current fruit to be tested
ripe<-unique(df$Ripeness)
## make sure two means exist; ensure there are both ripe and non-ripe values
if(length(ripe) < 2){
## if only one ripeness, let user know and skip to next unique fruit
message("only one ripeness")
next
}
## try testing the fruit and return p-value if success
tryCatch(
{
message(t.test(Mean ~ Ripeness, data = df)$p.value)
},
## if error in t-testing return message that there are "not enough observations"
error=function(cond) {
message("not enough observations")
}
)
}
I hope this helps!

- 736
- 1
- 6
- 17
-
thanks for the input! I'm trying to plug this in, but it doesn't seem to be working with the category of fruits that only have one type of ripeness ("yes") listed. Would there be a way to work around this while also working for the fruits that have two ripeness categories? – NickL Jun 04 '20 at 17:55
-
1The above answer has been updated to include a check to ensure there are two types of "Ripeness" before comparing means! Best wishes! – Joshua Mire Jun 04 '20 at 18:51
-
Thanks again! I ran into another problem that I was hoping you could have some input on. The t.test argument seems to run into an error when there isn't enough observations. Is there an if statement that could be used so that if it runs into an error it skips that certain result, and onto the next valid t.test? – NickL Jun 04 '20 at 20:59
-
1My updated code wraps the `t.test()` in the `tryCatch()` function. The `tryCatch()` function can be used for catching error and warning messages. A great `tryCatch()` explanation can be found [here](https://stackoverflow.com/questions/12193779/how-to-write-trycatch-in-r). – Joshua Mire Jun 04 '20 at 23:59
-
I have a follow up question regarding the results of this for loop. Assuming I initialized a dataframe called finalOut outside of the for loop, how can i store the results of the t-test such that finalOut will have 1 column named Fruits and the other with the p-values of the t-test? – NickL Jun 09 '20 at 18:07
Assuming fruits
is coded as a categorical variable (i.e. factor
as it should be), you could use sapply
to iteratively subset data by each fruit. In t.test
we use alternative="two.sided"
, just to emphasize although its the default setting.
However, your data is very small and Bananas
are only ripe. I therefore a larger sample data set to demonstrate.
res <- sapply(levels(dat$fruits), function(x)
t.test(mean ~ ripeness, dat[dat$fruits %in% x, ], alternative="two.sided")
)
res
# Apple Banana Orange
# statistic 0.948231 0.3432062 0.4421971
# parameter 23.38387 30.86684 16.47366
# p.value 0.3527092 0.7337699 0.664097
# conf.int Numeric,2 Numeric,2 Numeric,2
# estimate Numeric,2 Numeric,2 Numeric,2
# null.value 0 0 0
# stderr 0.8893453 1.16548 1.043739
# alternative "two.sided" "two.sided" "two.sided"
# method "Welch Two Sample t-test" "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name "mean by ripeness" "mean by ripeness" "mean by ripeness"
Data:
set.seed(42)
n <- 1e2
dat <- data.frame(fruits=factor(sample(1:3, n, replace=T),
labels=c("Apple", "Banana", "Orange")),
ripeness=factor(rbinom(n, 1, .4), labels=c("yes", "no")),
mean=round(runif(n)*10))
Please note for the future that you should include a minimal self-contained example including data in an appropriate format (never images, please read here on how to do that), and all the steps you've tried so far, since Stack Overflow is no coding service. Cheers!

- 60,139
- 8
- 53
- 110