1

I have a data frame with the categories fruits, ripeness, and mean. How can I create a for loop that runs a ttest to determine the mean difference for the ripeness for EACH fruit? In other words, for apples, the ttest would produce a result of the mean difference between ripe and unripe apples. An example of this would look like the following table. Table Example

Phil
  • 7,287
  • 3
  • 36
  • 66
NickL
  • 103
  • 6

2 Answers2

2

Something like this could work for returning p-values of the t-test comparing "Ripeness" as you loop through the unique "Fruits" that appear in your data.

## create a vector of the unique fruit in the data; vector of fruit to be tested
fruit<-unique(data$Fruits)
## iterate through your list of unique fruit, testing as you go
for(i in 1:length(fruit)){
  ## subset your data to include only the current fruit to be tested
  df<-filter(data, Fruits==fruit[i])
  ## let the user know which fruit is being tested
  message(fruit[i])
  ## create a vector of the unique ripeness states of the current fruit to be tested
  ripe<-unique(df$Ripeness)
  ## make sure two means exist; ensure there are both ripe and non-ripe values
  if(length(ripe) < 2){
    ## if only one ripeness, let user know and skip to next unique fruit
    message("only one ripeness")
    next
  }
  ## try testing the fruit and return p-value if success
  tryCatch(
    {
      message(t.test(Mean ~ Ripeness, data = df)$p.value)
    },
    ## if error in t-testing return message that there are "not enough observations"
    error=function(cond) {
      message("not enough observations")
    }
  )    
}

I hope this helps!

Joshua Mire
  • 736
  • 1
  • 6
  • 17
  • thanks for the input! I'm trying to plug this in, but it doesn't seem to be working with the category of fruits that only have one type of ripeness ("yes") listed. Would there be a way to work around this while also working for the fruits that have two ripeness categories? – NickL Jun 04 '20 at 17:55
  • 1
    The above answer has been updated to include a check to ensure there are two types of "Ripeness" before comparing means! Best wishes! – Joshua Mire Jun 04 '20 at 18:51
  • Thanks again! I ran into another problem that I was hoping you could have some input on. The t.test argument seems to run into an error when there isn't enough observations. Is there an if statement that could be used so that if it runs into an error it skips that certain result, and onto the next valid t.test? – NickL Jun 04 '20 at 20:59
  • 1
    My updated code wraps the `t.test()` in the `tryCatch()` function. The `tryCatch()` function can be used for catching error and warning messages. A great `tryCatch()` explanation can be found [here](https://stackoverflow.com/questions/12193779/how-to-write-trycatch-in-r). – Joshua Mire Jun 04 '20 at 23:59
  • I have a follow up question regarding the results of this for loop. Assuming I initialized a dataframe called finalOut outside of the for loop, how can i store the results of the t-test such that finalOut will have 1 column named Fruits and the other with the p-values of the t-test? – NickL Jun 09 '20 at 18:07
1

Assuming fruits is coded as a categorical variable (i.e. factor as it should be), you could use sapply to iteratively subset data by each fruit. In t.test we use alternative="two.sided", just to emphasize although its the default setting.

However, your data is very small and Bananas are only ripe. I therefore a larger sample data set to demonstrate.

res <- sapply(levels(dat$fruits), function(x) 
  t.test(mean ~ ripeness, dat[dat$fruits %in% x, ], alternative="two.sided")
)
res
#             Apple                     Banana                    Orange                   
# statistic   0.948231                  0.3432062                 0.4421971                
# parameter   23.38387                  30.86684                  16.47366                 
# p.value     0.3527092                 0.7337699                 0.664097                 
# conf.int    Numeric,2                 Numeric,2                 Numeric,2                
# estimate    Numeric,2                 Numeric,2                 Numeric,2                
# null.value  0                         0                         0                        
# stderr      0.8893453                 1.16548                   1.043739                 
# alternative "two.sided"               "two.sided"               "two.sided"              
# method      "Welch Two Sample t-test" "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name   "mean by ripeness"        "mean by ripeness"        "mean by ripeness"     

Data:

set.seed(42)
n <- 1e2
dat <- data.frame(fruits=factor(sample(1:3, n, replace=T),
                                labels=c("Apple", "Banana", "Orange")),
                  ripeness=factor(rbinom(n, 1, .4), labels=c("yes", "no")),
                  mean=round(runif(n)*10))

Please note for the future that you should include a minimal self-contained example including data in an appropriate format (never images, please read here on how to do that), and all the steps you've tried so far, since Stack Overflow is no coding service. Cheers!

jay.sf
  • 60,139
  • 8
  • 53
  • 110