Comparing two groups (control and intervention) for clinical study, multiple visits, descriptive statistics

Question

I trying to compare two groups of patients (control and intervention) for multiple study visits.

Example of measurements: Hemoglobin, Troponin, Myoglobin, Creatinin, C reactive Protein (CRP)

This means I would like to see a difference between these groups for different Visits, e.g. intervention group has lower CRP at visit 2 than controls. Additionally, I would like to compare the patients with themselves, e.g. patient 2 has lower CRP at visit 3, than at visit 2.

Ultimately, I would like to show my data graphically (for a mean of the interventions and controls a line, one plot for every marker) and primarily do descriptive statistics without testing (since my sample size is pretty small and this is more explorative.

So far I have created a .csv with all data where I made columns indicating, if patients are control or intervention. This table is sortable by visit, control/intervention and patient ID.

Welcome to R! It's a fun language. Can you share some sample data? — writer_typer, Jan 24 '22 at 14:44
Hi @TyperWriter, here is a dropbox link: https://www.dropbox.com/sh/36d6cm5q91niws9/AABVRkydnNAh9dZPCYBbGWada?dl=0 (for data protection I deleted many coulmns and changed numbers, but it gives an idea of what to look at). Would you like me to share it differently? Thank you! — Rnewbie, Jan 24 '22 at 15:02
@Rnewbie it's best to share it using ``dput()`` - please take a moment to read [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Thanks. — user438383, Jan 24 '22 at 15:24
Thank you, @user438383! When I put this in the console and try to copy it here it has too many characters though, is there any way around this problem? — Rnewbie, Jan 24 '22 at 15:53
I put dput(head(df, 100)) ,but this somehow changed the data (added in "L" behind). The copied characters are also still too many to paste here. :( — Rnewbie, Jan 24 '22 at 16:09
The added L is fine, it’s just R’s way of representing integer values. Keep reducing the number until you can add it into the question. — user438383, Jan 24 '22 at 18:55

writer_typer · Accepted Answer · 2022-01-25T21:00:17.983

First step is to install and load the packages.

install.packages("tidyverse")
install.packages("janitor")

library(tidyverse)
library(janitor)

library(readr)
data <- read_csv("Descriptive statistics_Sample data.csv")

The dataset had 12 columns and 31 rows with the following names

pseudonym               
control(0/1)                
intervention(0/1)               
visit               
weight-V1-3             
height              
Systolic.Blood.Pressure_V1-3                
Diastolic.Blood.Pressure_V1-3               
Pulse_V1-3              
Respiration.Rate_V1-3
HS.cTnT.(ng/l)_V1-3             
Myoglobin.(ug/l)_V1-3

Some of these names may be hard to work with in R, so I cleaned the names using a function called clean_names() from the janitor package.

data <- clean_names(data)

pseudonym               
control_0_1             
intervention_0_1                
visit               
weight_v1_3             
height              
systolic_blood_pressure_v1_3                
diastolic_blood_pressure_v1_3               
pulse_v1_3              
respiration_rate_v1_3
hs_c_tn_t_ng_l_v1_3             
myoglobin_ug_l_v1_3

Next, we need to create a new categorical variable by combining the control_0_1 and intervetion_0_1 variables. The name of the variable can be anything. I have named it group. We create this variable using mutate function. We then fill in values for this new variable using case_when function which helps us recode values. If there is a 1 in the control_0_1 variable, we ask it to call it "control", and similarly for the intervention_0_1 variable.

mutate(group = case_when(control_0_1 == 1 ~ "control",
                           intervention_0_1 == 1 ~ "intervention"))

I like to move the newly created variable to the beginning of the dataframe to see it easier. This step is not necessary.

relocate(group, .after = 1)

Symbols like %>% are called pipes. Read them like "and then". For example, we get data (and then) mutate a new column (and then) relocate it. We are also overwriting the object with <- symbol.

data <- data %>% 
  mutate(group = case_when(control_0_1 == 1 ~ "control",
                           intervention_0_1 == 1 ~ "intervention")) %>% # creates a new categorical variable called "group". 
  relocate(group, .after = 1) # moves the group column from the end of the dataframe to after the 1st column - this step is not necessary, but I like to see the grouping variable close to the beginning of the dataframe. 

data

To get the means for the entire dataset, we use a function called summarize. This is similar to mutate where we create a new column called mean_resp (name can be anything) and calculate the mean of the respiration_rate_v1_3 column. We also remove missing values if we need to with na.rm = TRUE.

data %>%
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))

mean_resp
15.32258

To group this by the new group variable, we add a new line with group_by function and add the group variable inside like this group_by(group).

data %>% 
  group_by(group) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))

This results in:

group          mean_resp
control         14.80000            
intervention    15.57143

To further group this by visits, we have to add visit to the group_by function.

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))

group       visit   mean_resp
control         1   14.00000        
control         2   15.60000        
intervention    1   15.33333        
intervention    2   15.00000        
intervention    3   16.11111

This has 5 rows, but it will be nice to see this as a transposed table.

This can be done by using the pivot_wider function. We take the names from column visit and create three new columns simply called 1, 2, 3. The values for these new columns will be from the mean_resp column. We do this with this pivot_wider(names_from = visit, values_from = mean_resp)

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>% 
  pivot_wider(names_from = visit, values_from = mean_resp)

This results in

group             1          2       3
control          14.00000   15.6    NA  
intervention     15.33333   15.0    16.11111

To visualise this, we can create a ggplot.

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3)) %>% 
  ggplot(aes(x = factor(visit), y = mean_resp, group = factor(group), color = factor(group))) +
  geom_line(size = 1) +
    geom_point() + scale_color_brewer(palette = "Dark2") + theme_minimal()

To get means by patient

data %>% 
  group_by(pseudonym, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>% 
  pivot_wider(names_from = visit, values_from = mean_resp)

pseudonym       1   2   3
   1            20  20  20  
   2            16  12  16  
   3            16  19  NA  
   4            13  13  14  
   5            13  15  16  
   6            18  16  16  
   7            12  14  18  
   8            13  16  18  
   9            16  16  11  
  10            13  14  16

data %>% 
  group_by(pseudonym, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3)) %>% 
  ggplot(aes(x = factor(visit), y = mean_resp, group = factor(pseudonym), color = factor(pseudonym))) +
  geom_line(size = 1) +
    geom_point() +
  scale_y_binned(limits = c(10, 21)) +
 scale_color_brewer(palette = "Paired") +
  theme_bw()

Google Colab link:

https://colab.research.google.com/drive/1bNZwpvEOt6dOEOoCrN_a14G5Pz021NAf?usp=sharing

Dear @TyperWriter, thank you so much! I did what you suggested, but unfortunately got an error message when I tried to relocate "group": "> relocate(group, .after = 1) Error in relocate(group, .after = 1) : object 'group' not found". Since R could not find "group", i.e. the new variable, I wonder if the mutate code maybe did not work? Is there a good way to show you may output? — Rnewbie, Jan 25 '22 at 13:48
It may be better to load `library(tidyverse)` to check if the `mutate` function is available to use. — writer_typer, Jan 25 '22 at 14:24
I did load tidyverse from the library first, but it unfortunately didn't change the error message. — Rnewbie, Jan 25 '22 at 17:46
Sometimes it helps to uninstall and reinstall R and then install the packages. I'm sorry I couldn't help further on this error message. Please let me know if you have any questions. — writer_typer, Jan 25 '22 at 18:37
thank you for your help. I will try some more and maybe get back to you. I just uploaded the sampledata into the workspace you created, but haven't figured out how to write code over there just yet. — Rnewbie, Jan 25 '22 at 18:51
I noticed you had two quotations for read_csv. And you were loading data from the folder. So you'll have to change it to `data <- read_csv("Sampledata/Descriptive statistics_Sample data.csv")` — writer_typer, Jan 25 '22 at 18:58
I just installed the tidyverse and janitor packages over there, but I get this error"/cloud/project$ library(tidyverse) bash: syntax error near unexpected token `tidyverse' /cloud/project$". — Rnewbie, Jan 25 '22 at 19:01
Unfortunately, I really can't see the notebook or the green triangles ... sorry for getting so hung up on all the small stuff :( — Rnewbie, Jan 25 '22 at 19:03
Perhaps trying it on Google Colab might work. Try this link: https://colab.research.google.com/drive/1bNZwpvEOt6dOEOoCrN_a14G5Pz021NAf?usp=sharing — writer_typer, Jan 25 '22 at 19:31
Click on the play button on the left of each cell to run the code. Repeat for each cell. — writer_typer, Jan 25 '22 at 19:32
Hi Typerwriter, thank you so, so much for all the help! :) I understand the code you wrote and am really happy about this (maybe I will still find something that I will ask you about later). I still don't really know what the issue is on my local Rstudio, but I will try to deinstall and reinstall everything as you suggested. — Rnewbie, Jan 25 '22 at 20:38
You are welcome. :) You will probably get help from most of stack overflow users if your questions are simple and are about just one thing. That can be frustrating to separate your project into simple questions. But if you ask short questions, you will be likely to get quick answers. — writer_typer, Jan 25 '22 at 21:12

Comparing two groups (control and intervention) for clinical study, multiple visits, descriptive statistics

1 Answers1