I'm attempting to use dplyr to analyze experiment data. My current data set represents five patients. For each patient, two samples are non-treated and there are four treated samples. I want to average the non-treated samples and then normalize all the observations for each patient to the average of the non-treated samples.
I'm easily able to get the baseline for each patient:
library(dplyr)
library(magrittr)
baselines <-main_table %>%
filter(Treatment == "N/A") %>%
group_by(PATIENT.ID) %>%
summarize(mean_CD4 = mean(CD3pos.CD8neg))
What is an efficient way to reference these values when I go back to mutate in the main table? Ideally being able to use PATIENT.ID
to filter/select somehow rather than having to specify the actual patient IDs, which change from one experiment to the next?
What I've been doing is saving the values out of the summarized table and then using those inside mutate
, but this solution is UGLY. I really do not like having the patient IDs hard coded in like this because they change from experiment to experiment and manually changing them introduces errors that are hard to catch.
patient_1_baseline <- baselines[[1, 2]]
patient_2_baseline <- baselines[[2, 2]]
main_table %>%
mutate(percent_of_baseline = ifelse(
PATIENT.ID == "108", CD3pos.CD8neg / patient_1_basline * 100,
ifelse(PATIENT.ID == "patient_2", ......
Another way to approach this would be to try to group by patient ID, summarize
to get the baseline, and then mutate
, but I cannot quite figure out how to do that either.
This is ultimately a symptom of a larger problem. I have the tidyverse
basics down ok but I am struggling to move to the next level where I can handle more complex situations like this one. Any advice about this specific scenario or the big picture problem are deeply appreciated.
Edited to add: Sample data set
PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg
1 108 Day 1 42570 24324
2 108 Day 2 36026 20842
3 108 Day 3 40449 22882
4 108 Day 4 52831 32034
5 108 N/A 71348 38340
6 108 N/A 60113 34294