0

I wish to learn a more efficient to get averaged values that have different suffixes, such as _L and _R.

For every brain region, I would like to create a new variable that is the averaged between the left and right hemispheres. This is denoted by the suffix _L and _R.

I am doing this manually in excel but it could be error prone.

The dataset is currently in a long format. Example: For each subject, the goal is to create a new region called odi.lobar.Temporal_Average which is the averaged of 0.030767 and 0.371568.

subject region  value
1   odi.lobar.Temporal_L    0.030767
1   odi.lobar.Temporal_R    0.371568
2   odi.lobar.Temporal_L    0.371568
2   odi.lobar.Temporal_R    0.369251

Many thanks.

emlab
  • 23
  • 3

3 Answers3

1

If you want the mean for each subject and if you only got one value per subject for each region:

Data:

df <- data.frame(subject = c("1","1","2","2"),
                 region = c("odi.lobar.Temporal_L","odi.lobar.Temporal_R","odi.lobar.Temporal_L","odi.lobar.Temporal_R"),
                 value = c(0.030767,0.371568,0.371568,0.369251))

Code:

library(dplyr)
output <- df %>%
  group_by(subject) %>%
  summarise(odi.lobar.Temporal_Average= mean(value))

Output:

  subject odi.lobar.Temporal_Average
  <chr>                        <dbl>
1 1                            0.201
2 2                            0.370

Edit: If you want to keep initial data:

output <- df %>%
  group_by(subject) %>%
  summarise(odi.lobar.Temporal_Average = mean(value))%>% 
  pivot_longer(odi.lobar.Temporal_Average) %>% 
  rename(region = name) %>% 
  bind_rows(df) %>% 
  arrange(subject)

Output:

  subject region                      value
  <chr>   <chr>                       <dbl>
1 1       odi.lobar.Temporal_Average 0.201 
2 1       odi.lobar.Temporal_L       0.0308
3 1       odi.lobar.Temporal_R       0.372 
4 2       odi.lobar.Temporal_Average 0.370 
5 2       odi.lobar.Temporal_L       0.372 
6 2       odi.lobar.Temporal_R       0.369 
MonJeanJean
  • 2,876
  • 1
  • 4
  • 20
1
library(tidyverse)
df1 %>%
  separate(region, c("region", "side"), sep = "_") %>%
  group_by(subject, region = paste0(region, "_Average")) %>%
  summarize(value = mean(value), .groups = "drop") %>%
  bind_rows(df1)


# A tibble: 6 x 3
  subject region                      value
    <int> <chr>                       <dbl>
1       1 odi.lobar.Temporal_Average 0.201 
2       2 odi.lobar.Temporal_Average 0.370 
3       1 odi.lobar.Temporal_L       0.0308
4       1 odi.lobar.Temporal_R       0.372 
5       2 odi.lobar.Temporal_L       0.372 
6       2 odi.lobar.Temporal_R       0.369 
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
1

In base R, you can remove the suffix with sub and use aggregate -

res <- aggregate(value ~ region + subject, 
           transform(df, region = sub('_.*', '', region)), mean, na.rm = TRUE)
res
#             region subject     value
#1 odi.lobar.Temporal       1 0.2011675
#2 odi.lobar.Temporal       2 0.3704095
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This works, thanks! How can I insert those values back into the original dataset, so that each subject now has values for `odi.lobar.Temporal_L` `odi.lobar.Temporal_R` and the new average? – emlab Jul 21 '21 at 07:09
  • You can use `rbind` - `res <- rbind(df, res)` – Ronak Shah Jul 21 '21 at 10:35