0

I trying to compare two groups of patients (control and intervention) for multiple study visits.

Example of measurements: Hemoglobin, Troponin, Myoglobin, Creatinin, C reactive Protein (CRP)

This means I would like to see a difference between these groups for different Visits, e.g. intervention group has lower CRP at visit 2 than controls. Additionally, I would like to compare the patients with themselves, e.g. patient 2 has lower CRP at visit 3, than at visit 2.

Ultimately, I would like to show my data graphically (for a mean of the interventions and controls a line, one plot for every marker) and primarily do descriptive statistics without testing (since my sample size is pretty small and this is more explorative.

So far I have created a .csv with all data where I made columns indicating, if patients are control or intervention. This table is sortable by visit, control/intervention and patient ID.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Rnewbie
  • 33
  • 6
  • Welcome to R! It's a fun language. Can you share some sample data? – writer_typer Jan 24 '22 at 14:44
  • Hi @TyperWriter, here is a dropbox link: https://www.dropbox.com/sh/36d6cm5q91niws9/AABVRkydnNAh9dZPCYBbGWada?dl=0 (for data protection I deleted many coulmns and changed numbers, but it gives an idea of what to look at). Would you like me to share it differently? Thank you! – Rnewbie Jan 24 '22 at 15:02
  • @Rnewbie it's best to share it using ``dput()`` - please take a moment to read [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Thanks. – user438383 Jan 24 '22 at 15:24
  • Thank you, @user438383! When I put this in the console and try to copy it here it has too many characters though, is there any way around this problem? – Rnewbie Jan 24 '22 at 15:53
  • try ``dput(head(df, 100))`` – user438383 Jan 24 '22 at 15:56
  • I put dput(head(df, 100)) ,but this somehow changed the data (added in "L" behind). The copied characters are also still too many to paste here. :( – Rnewbie Jan 24 '22 at 16:09
  • The added L is fine, it’s just R’s way of representing integer values. Keep reducing the number until you can add it into the question. – user438383 Jan 24 '22 at 18:55

1 Answers1

0

First step is to install and load the packages.

install.packages("tidyverse")
install.packages("janitor")

library(tidyverse)
library(janitor)

library(readr)
data <- read_csv("Descriptive statistics_Sample data.csv")

The dataset had 12 columns and 31 rows with the following names

pseudonym               
control(0/1)                
intervention(0/1)               
visit               
weight-V1-3             
height              
Systolic.Blood.Pressure_V1-3                
Diastolic.Blood.Pressure_V1-3               
Pulse_V1-3              
Respiration.Rate_V1-3
HS.cTnT.(ng/l)_V1-3             
Myoglobin.(ug/l)_V1-3

Some of these names may be hard to work with in R, so I cleaned the names using a function called clean_names() from the janitor package.

data <- clean_names(data)

pseudonym               
control_0_1             
intervention_0_1                
visit               
weight_v1_3             
height              
systolic_blood_pressure_v1_3                
diastolic_blood_pressure_v1_3               
pulse_v1_3              
respiration_rate_v1_3
hs_c_tn_t_ng_l_v1_3             
myoglobin_ug_l_v1_3

Next, we need to create a new categorical variable by combining the control_0_1 and intervetion_0_1 variables. The name of the variable can be anything. I have named it group. We create this variable using mutate function. We then fill in values for this new variable using case_when function which helps us recode values. If there is a 1 in the control_0_1 variable, we ask it to call it "control", and similarly for the intervention_0_1 variable.

mutate(group = case_when(control_0_1 == 1 ~ "control",
                           intervention_0_1 == 1 ~ "intervention"))

I like to move the newly created variable to the beginning of the dataframe to see it easier. This step is not necessary.

relocate(group, .after = 1) 

Symbols like %>% are called pipes. Read them like "and then". For example, we get data (and then) mutate a new column (and then) relocate it. We are also overwriting the object with <- symbol.

data <- data %>% 
  mutate(group = case_when(control_0_1 == 1 ~ "control",
                           intervention_0_1 == 1 ~ "intervention")) %>% # creates a new categorical variable called "group". 
  relocate(group, .after = 1) # moves the group column from the end of the dataframe to after the 1st column - this step is not necessary, but I like to see the grouping variable close to the beginning of the dataframe. 

data

To get the means for the entire dataset, we use a function called summarize. This is similar to mutate where we create a new column called mean_resp (name can be anything) and calculate the mean of the respiration_rate_v1_3 column. We also remove missing values if we need to with na.rm = TRUE.

data %>%
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))

mean_resp
15.32258    

To group this by the new group variable, we add a new line with group_by function and add the group variable inside like this group_by(group).

data %>% 
  group_by(group) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))

This results in:

group          mean_resp
control         14.80000            
intervention    15.57143    

To further group this by visits, we have to add visit to the group_by function.

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))
group       visit   mean_resp
control         1   14.00000        
control         2   15.60000        
intervention    1   15.33333        
intervention    2   15.00000        
intervention    3   16.11111    

This has 5 rows, but it will be nice to see this as a transposed table.

This can be done by using the pivot_wider function. We take the names from column visit and create three new columns simply called 1, 2, 3. The values for these new columns will be from the mean_resp column. We do this with this pivot_wider(names_from = visit, values_from = mean_resp)

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>% 
  pivot_wider(names_from = visit, values_from = mean_resp)

This results in

group             1          2       3
control          14.00000   15.6    NA  
intervention     15.33333   15.0    16.11111    

To visualise this, we can create a ggplot.

data %>% 
  group_by(group, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3)) %>% 
  ggplot(aes(x = factor(visit), y = mean_resp, group = factor(group), color = factor(group))) +
  geom_line(size = 1) +
    geom_point() + scale_color_brewer(palette = "Dark2") + theme_minimal()

enter image description here

To get means by patient

data %>% 
  group_by(pseudonym, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>% 
  pivot_wider(names_from = visit, values_from = mean_resp)
pseudonym       1   2   3
   1            20  20  20  
   2            16  12  16  
   3            16  19  NA  
   4            13  13  14  
   5            13  15  16  
   6            18  16  16  
   7            12  14  18  
   8            13  16  18  
   9            16  16  11  
  10            13  14  16  
data %>% 
  group_by(pseudonym, visit) %>% 
  summarize(mean_resp = mean(respiration_rate_v1_3)) %>% 
  ggplot(aes(x = factor(visit), y = mean_resp, group = factor(pseudonym), color = factor(pseudonym))) +
  geom_line(size = 1) +
    geom_point() +
  scale_y_binned(limits = c(10, 21)) +
 scale_color_brewer(palette = "Paired") +
  theme_bw()

enter image description here

Google Colab link:

https://colab.research.google.com/drive/1bNZwpvEOt6dOEOoCrN_a14G5Pz021NAf?usp=sharing

writer_typer
  • 708
  • 7
  • 25
  • 1
    Dear @TyperWriter, thank you so much! I did what you suggested, but unfortunately got an error message when I tried to relocate "group": "> relocate(group, .after = 1) Error in relocate(group, .after = 1) : object 'group' not found". Since R could not find "group", i.e. the new variable, I wonder if the mutate code maybe did not work? Is there a good way to show you may output? – Rnewbie Jan 25 '22 at 13:48
  • It may be better to load `library(tidyverse)` to check if the `mutate` function is available to use. – writer_typer Jan 25 '22 at 14:24
  • I was able to load the workspace :) Thank you! – Rnewbie Jan 25 '22 at 17:45
  • I did load tidyverse from the library first, but it unfortunately didn't change the error message. – Rnewbie Jan 25 '22 at 17:46
  • Sometimes it helps to uninstall and reinstall R and then install the packages. I'm sorry I couldn't help further on this error message. Please let me know if you have any questions. – writer_typer Jan 25 '22 at 18:37
  • thank you for your help. I will try some more and maybe get back to you. I just uploaded the sampledata into the workspace you created, but haven't figured out how to write code over there just yet. – Rnewbie Jan 25 '22 at 18:51
  • I noticed you had two quotations for read_csv. And you were loading data from the folder. So you'll have to change it to `data <- read_csv("Sampledata/Descriptive statistics_Sample data.csv")` – writer_typer Jan 25 '22 at 18:58
  • I just installed the tidyverse and janitor packages over there, but I get this error"/cloud/project$ library(tidyverse) bash: syntax error near unexpected token `tidyverse' /cloud/project$". – Rnewbie Jan 25 '22 at 19:01
  • Unfortunately, I really can't see the notebook or the green triangles ... sorry for getting so hung up on all the small stuff :( – Rnewbie Jan 25 '22 at 19:03
  • Perhaps trying it on Google Colab might work. Try this link: https://colab.research.google.com/drive/1bNZwpvEOt6dOEOoCrN_a14G5Pz021NAf?usp=sharing – writer_typer Jan 25 '22 at 19:31
  • Click on the play button on the left of each cell to run the code. Repeat for each cell. – writer_typer Jan 25 '22 at 19:32
  • 1
    Hi Typerwriter, thank you so, so much for all the help! :) I understand the code you wrote and am really happy about this (maybe I will still find something that I will ask you about later). I still don't really know what the issue is on my local Rstudio, but I will try to deinstall and reinstall everything as you suggested. – Rnewbie Jan 25 '22 at 20:38
  • You are welcome. :) You will probably get help from most of stack overflow users if your questions are simple and are about just one thing. That can be frustrating to separate your project into simple questions. But if you ask short questions, you will be likely to get quick answers. – writer_typer Jan 25 '22 at 21:12