How to modify multiple dataframes in rstudio?

Question

I am working with multiple data frames(over 20) and I like to make a loop which add two new columns of mean value of both columns in every data frames. I like to use loop because amount of data frames can alter.

Example of data:

df_1:
   Width Thickness
1  1000    1
2  1500    2

df_2:
1  1200    3
2  1200    4
3  1000    2

df_3:
1  1200    3
2  1500    4


desired outcome:
df_1:
   Width Thickness mean_width mean_thick
1  1000    1           1250       1.5
2  1500    2           1250       1.5

Can you provide a reproducible example (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Also this is a summary operation on the data, so it does not make much sense replicating the same value into all of the rows. — knytt, Aug 18 '20 at 08:43

score 1 · Accepted Answer · answered Aug 18 '20 at 08:45

1

You can get all the dataframes in a list based on the pattern in their name using ls and mget. We can then use lapply to add new columns to each dataframe

new_data <- lapply(mget(ls(pattern = 'df_\\d+')), function(x) {
  x[paste0('mean_', names(x))] <- as.list(colMeans(x, na.rm = TRUE))
  x
})

new_data will have list of dataframes in them, if you want the changes to be reflected in the original dataframes use list2env :

list2env(new_data, .GlobalEnv)

answered Aug 18 '20 at 08:45

Ronak Shah

377,200
20
156
213

That is a very neat trick for making the list of dataframes and updating them in the global enviorment - thanks for sharing. – John Aug 18 '20 at 08:47
Thanks for this. The solution is elegant and just the way I wanted. – T_sensaatio Aug 20 '20 at 10:56

score 0 · Answer 2 · answered Aug 18 '20 at 08:45

I would suggest making a list of dataframes and then applying a function over that list.

Below I'm using tidyverse's map function but this is also achievable using base R and the apply family of functions:

library(tidyverse)

df_list <- list(df_1, df_2, df_3)

map(df_list, mutate, mean_width = mean(Width), mean_thick = mean(Thickness))

score 0 · Answer 3 · answered Aug 18 '20 at 19:52

It would be better to create a single dataset and then do a group by operation

library(dplyr)   
mget(ls(pattern = 'df_\\d+')) %>%
      bind_rows(.id = 'grp') %>%
      group_by(grp) %>%
      mutate(across(everything(), mean, na.rm = TRUE, .names = "mean_{col}")) %>%
      ungroup

How to modify multiple dataframes in rstudio?

3 Answers3