First step is to install and load the packages.
install.packages("tidyverse")
install.packages("janitor")
library(tidyverse)
library(janitor)
library(readr)
data <- read_csv("Descriptive statistics_Sample data.csv")
The dataset had 12 columns and 31 rows with the following names
pseudonym
control(0/1)
intervention(0/1)
visit
weight-V1-3
height
Systolic.Blood.Pressure_V1-3
Diastolic.Blood.Pressure_V1-3
Pulse_V1-3
Respiration.Rate_V1-3
HS.cTnT.(ng/l)_V1-3
Myoglobin.(ug/l)_V1-3
Some of these names may be hard to work with in R, so I cleaned the names using a function called clean_names()
from the janitor
package.
data <- clean_names(data)
pseudonym
control_0_1
intervention_0_1
visit
weight_v1_3
height
systolic_blood_pressure_v1_3
diastolic_blood_pressure_v1_3
pulse_v1_3
respiration_rate_v1_3
hs_c_tn_t_ng_l_v1_3
myoglobin_ug_l_v1_3
Next, we need to create a new categorical variable by combining the control_0_1
and intervetion_0_1
variables. The name of the variable can be anything. I have named it group
. We create this variable using mutate
function. We then fill in values for this new variable using case_when
function which helps us recode values. If there is a 1 in the control_0_1 variable, we ask it to call it "control", and similarly for the intervention_0_1 variable.
mutate(group = case_when(control_0_1 == 1 ~ "control",
intervention_0_1 == 1 ~ "intervention"))
I like to move the newly created variable to the beginning of the dataframe to see it easier. This step is not necessary.
relocate(group, .after = 1)
Symbols like %>%
are called pipes. Read them like "and then". For example, we get data (and then) mutate a new column (and then) relocate it. We are also overwriting the object with <-
symbol.
data <- data %>%
mutate(group = case_when(control_0_1 == 1 ~ "control",
intervention_0_1 == 1 ~ "intervention")) %>% # creates a new categorical variable called "group".
relocate(group, .after = 1) # moves the group column from the end of the dataframe to after the 1st column - this step is not necessary, but I like to see the grouping variable close to the beginning of the dataframe.
data
To get the means for the entire dataset, we use a function called summarize. This is similar to mutate where we create a new column called mean_resp
(name can be anything) and calculate the mean of the respiration_rate_v1_3
column. We also remove missing values if we need to with na.rm = TRUE
.
data %>%
summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))
mean_resp
15.32258
To group this by the new group variable, we add a new line with group_by
function and add the group variable inside like this group_by(group)
.
data %>%
group_by(group) %>%
summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))
This results in:
group mean_resp
control 14.80000
intervention 15.57143
To further group this by visits, we have to add visit
to the group_by
function.
data %>%
group_by(group, visit) %>%
summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE))
group visit mean_resp
control 1 14.00000
control 2 15.60000
intervention 1 15.33333
intervention 2 15.00000
intervention 3 16.11111
This has 5 rows, but it will be nice to see this as a transposed table.
This can be done by using the pivot_wider
function. We take the names from column visit
and create three new columns simply called 1, 2, 3. The values for these new columns will be from the mean_resp
column. We do this with this pivot_wider(names_from = visit, values_from = mean_resp)
data %>%
group_by(group, visit) %>%
summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>%
pivot_wider(names_from = visit, values_from = mean_resp)
This results in
group 1 2 3
control 14.00000 15.6 NA
intervention 15.33333 15.0 16.11111
To visualise this, we can create a ggplot.
data %>%
group_by(group, visit) %>%
summarize(mean_resp = mean(respiration_rate_v1_3)) %>%
ggplot(aes(x = factor(visit), y = mean_resp, group = factor(group), color = factor(group))) +
geom_line(size = 1) +
geom_point() + scale_color_brewer(palette = "Dark2") + theme_minimal()

To get means by patient
data %>%
group_by(pseudonym, visit) %>%
summarize(mean_resp = mean(respiration_rate_v1_3, na.rm = TRUE)) %>%
pivot_wider(names_from = visit, values_from = mean_resp)
pseudonym 1 2 3
1 20 20 20
2 16 12 16
3 16 19 NA
4 13 13 14
5 13 15 16
6 18 16 16
7 12 14 18
8 13 16 18
9 16 16 11
10 13 14 16
data %>%
group_by(pseudonym, visit) %>%
summarize(mean_resp = mean(respiration_rate_v1_3)) %>%
ggplot(aes(x = factor(visit), y = mean_resp, group = factor(pseudonym), color = factor(pseudonym))) +
geom_line(size = 1) +
geom_point() +
scale_y_binned(limits = c(10, 21)) +
scale_color_brewer(palette = "Paired") +
theme_bw()

Google Colab link:
https://colab.research.google.com/drive/1bNZwpvEOt6dOEOoCrN_a14G5Pz021NAf?usp=sharing