0

enter image description here

I am relatively new to R and need some help with my data analysis. In the attached table, Master Protein Accession column consists of a list of proteins that are increased or decreased in the cortex(C) under three conditions, i.e., control (C), dehydration(D) and rehydration(R). Each condition has 5 samples; CC(1,2,3,4 and 5), CD(1,2,3,4 and 5) and CR(1,2,3,4 and 5). I need to do a t-test for comparing Cortex Control (CC1,2,3,4 and 5) samples against Cortex Dehydration (CD1,2,3,4 and 5) samples respectively for all the proteins. Such that when I run the code, row 1 CC1 value gets t-tested against row 1 CD 1 value, row 2 CC1 value gets t-tested against row 2 CD 1 value and so on.

I tried

apply(allcor1, function(x){t.test(x[2:12],x[4:14], nchar)})

but it gives me

Error in match.fun(FUN) : argument "FUN" is missing, with no default

emilliman5
  • 5,816
  • 3
  • 27
  • 37
Rizwan
  • 1
  • 1
  • This is in Excel? Have you checked the Microsoft documentation for the version of Excel you are using? Do you have to install an statistics extension? – Jason Harrison Mar 23 '21 at 21:06
  • Hi Jason, Thanks for that. Yes, the original data is in excel but I have to do the t-test in R. I have managed to import the data in R but I can't seem to find a script to run the t-test in the manner I have described above, – Rizwan Mar 23 '21 at 21:27
  • I have changed it to R now. – Rizwan Mar 23 '21 at 21:33
  • Hi, Rizwan. try reading the help file of `t.test()` (run `?t.test` from your R console) and with that info, you might solve it yourself, if not you can edit the question and provide more specific information on what you are aiming. – Marcelo Avila Mar 23 '21 at 23:17
  • Hi Marcelo, Thank you. I have already tried doing that. I am a novice but can find my way around basic R. But this is a little complicated. I will edit the question and provide more specific information. – Rizwan Mar 24 '21 at 06:01
  • Do you need to use the `sd` columns for anything in the t-test? – David Robinson Mar 25 '21 at 12:40
  • Hi David, Thanks for the query. No. I dont. – Rizwan Mar 25 '21 at 12:49
  • And you want to run a paired t-test? – David Robinson Mar 25 '21 at 12:54
  • Yes. I want to run a paired t-test – Rizwan Mar 25 '21 at 13:08

1 Answers1

1

The challenge you have is that the data is too "wide": you are representing each protein as one row when it is at least 5 data points.

The problem gets easier if you reshape it. Here I'll use tidyr's pivot functions, as well as extract.

library(dplyr)
library(tidyr)

# Removing the "sd" columns,
# and renaming first column to "protein" to be easier to work with
longer_data <- yourdata %>%
  select(-starts_with("sd")) %>%
  rename(protein = 1) %>%
  # pivot all columns besides protein into one column condition_sample
  pivot_longer(cols = c(-protein),
               names_to = "condition_sample") %>%
  # Split your CC1, CD2, etc into two columns after the second letter
  separate(condition_sample, c("condition", "sample"), 2) %>%
  # Make them wide again by condition
  pivot_wider(names_from = condition, values_from = value)

I can't test without a reproducible example, but this should give you a table with columns protein, condition, sample (1-5) and value).

At this point, the data is more flexible to be used for statistical modeling, such as a paired t-test. I use dplyr here to do grouped t-tests of CC against CD, and the broom package to tidy it.

library(broom)

longer_data %>%
  group_by(protein) %>%
  summarize(tidied_model = list(tidy(t.test(CC, CD, paired = TRUE)))) %>%
  unnest(tidied_model)

This would give you columns estimate, statistic, and p.value, among others (confidence intervals, etc) for each protein.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • Thank you for that David. Is longer_data a function? Because now I get this error `rlang::last_error()` to see where the error occurred` – Rizwan Mar 25 '21 at 14:00
  • 1
    You have to run the first section of code that defines `longer_data`. (Use your variable name there instead of `yourdata`). – David Robinson Mar 25 '21 at 14:16
  • 1
    In answer to your question, `"condition_sample"` is the name of a new column, which is then referred to in the next `separate()` line. (It could have been anything; I named it that way because it's a combination of the condition and the sample number). You can read through how the `pivot_longer` and `pivot_wider` functions work here: https://tidyr.tidyverse.org/articles/pivot.html – David Robinson Mar 25 '21 at 16:06