T-test comparing multiple columns to other columns

Question

I am relatively new to R and need some help with my data analysis. In the attached table, Master Protein Accession column consists of a list of proteins that are increased or decreased in the cortex(C) under three conditions, i.e., control (C), dehydration(D) and rehydration(R). Each condition has 5 samples; CC(1,2,3,4 and 5), CD(1,2,3,4 and 5) and CR(1,2,3,4 and 5). I need to do a t-test for comparing Cortex Control (CC1,2,3,4 and 5) samples against Cortex Dehydration (CD1,2,3,4 and 5) samples respectively for all the proteins. Such that when I run the code, row 1 CC1 value gets t-tested against row 1 CD 1 value, row 2 CC1 value gets t-tested against row 2 CD 1 value and so on.

I tried

apply(allcor1, function(x){t.test(x[2:12],x[4:14], nchar)})

but it gives me

Error in match.fun(FUN) : argument "FUN" is missing, with no default

This is in Excel? Have you checked the Microsoft documentation for the version of Excel you are using? Do you have to install an statistics extension? — Jason Harrison, Mar 23 '21 at 21:06
Hi Jason, Thanks for that. Yes, the original data is in excel but I have to do the t-test in R. I have managed to import the data in R but I can't seem to find a script to run the t-test in the manner I have described above, — Rizwan, Mar 23 '21 at 21:27
Hi, Rizwan. try reading the help file of `t.test()` (run `?t.test` from your R console) and with that info, you might solve it yourself, if not you can edit the question and provide more specific information on what you are aiming. — Marcelo Avila, Mar 23 '21 at 23:17
Hi Marcelo, Thank you. I have already tried doing that. I am a novice but can find my way around basic R. But this is a little complicated. I will edit the question and provide more specific information. — Rizwan, Mar 24 '21 at 06:01
Do you need to use the `sd` columns for anything in the t-test? — David Robinson, Mar 25 '21 at 12:40

score 1 · Answer 1 · answered Mar 25 '21 at 12:58

The challenge you have is that the data is too "wide": you are representing each protein as one row when it is at least 5 data points.

The problem gets easier if you reshape it. Here I'll use tidyr's pivot functions, as well as extract.

library(dplyr)
library(tidyr)

# Removing the "sd" columns,
# and renaming first column to "protein" to be easier to work with
longer_data <- yourdata %>%
  select(-starts_with("sd")) %>%
  rename(protein = 1) %>%
  # pivot all columns besides protein into one column condition_sample
  pivot_longer(cols = c(-protein),
               names_to = "condition_sample") %>%
  # Split your CC1, CD2, etc into two columns after the second letter
  separate(condition_sample, c("condition", "sample"), 2) %>%
  # Make them wide again by condition
  pivot_wider(names_from = condition, values_from = value)

I can't test without a reproducible example, but this should give you a table with columns protein, condition, sample (1-5) and value).

At this point, the data is more flexible to be used for statistical modeling, such as a paired t-test. I use dplyr here to do grouped t-tests of CC against CD, and the broom package to tidy it.

library(broom)

longer_data %>%
  group_by(protein) %>%
  summarize(tidied_model = list(tidy(t.test(CC, CD, paired = TRUE)))) %>%
  unnest(tidied_model)

This would give you columns estimate, statistic, and p.value, among others (confidence intervals, etc) for each protein.

Thank you for that David. Is longer_data a function? Because now I get this error `rlang::last_error()` to see where the error occurred` — Rizwan, Mar 25 '21 at 14:00
You have to run the first section of code that defines `longer_data`. (Use your variable name there instead of `yourdata`). — David Robinson, Mar 25 '21 at 14:16
In answer to your question, `"condition_sample"` is the name of a new column, which is then referred to in the next `separate()` line. (It could have been anything; I named it that way because it's a combination of the condition and the sample number). You can read through how the `pivot_longer` and `pivot_wider` functions work here: https://tidyr.tidyverse.org/articles/pivot.html — David Robinson, Mar 25 '21 at 16:06

T-test comparing multiple columns to other columns

1 Answers1