Apply self-defined function on list of data frames in R

Question

This is a follow-up to a previous question of mine: Code in R to conditionally subtract columns in data frames

I now want to apply the given solution to my previous problem

cols <- grep('^\\d+$', names(df), value = TRUE)
new_cols <- paste0(cols, '_corrected')
df[new_cols] <- df[cols] - df[paste0('Background_', cols)]
df[c("Wavelength", new_cols)]

to every data frame in a list. I import all sheets of an excel file so that every sheet becomes one data frame in a list using this code (courtesy of Read all worksheets in an Excel workbook into an R list with data.frames 's top answer):

read_excel_allsheets <- function(filename, tibble = FALSE) {
  sheets <- readxl::excel_sheets(filename)
  x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
  if(!tibble) x <- lapply(x, as.data.frame)
  names(x) <- sheets
  x
}

mysheets <- read_excel_allsheets(file.choose())

How do I apply the first code box to my data frame list?

I want to get from something like this:

df_1 <- structure(list(Wavelength = 300:301, Background_1 = c(5L, 3L), 
                     `1` = c(11L, 12L), Background_2 = c(4L, 5L), `2` = c(12L, 10L)), 
                class = "data.frame", row.names = c(NA, -2L))

df_2 <- structure(list(Wavelength = 300:301, Background_1 = c(6L, 4L),
                     `1` = c(10L, 13L), Background_2 = c(5L, 6L), `2` = c(11L, 11L),
                     Background_3 = c(4L, 6L), `3` = c(13L, 13L)),
                class = "data.frame", row.names = c(NA, -2L))

df_list <- list(df_1, df_2)

To something like this:

df_1_corrected <- structure(list(Wavelength = 300:301, `1_corrected` = c(6L, 9L),
                                 `2_corrected` = c(8L, 5L)),
                class = "data.frame", row.names = c(NA, -2L))

df_2_corrected <- structure(list(Wavelength =300:301, `1_corrected` = c(4L, 9L),
                                 `2_corrected` = c(6L, 5L),
                                 `3_corrected` = c(9L, 7L)),
                class = "data.frame", row.names = c(NA, -2L))

df_corrected_list <- list(df_1_corrected, df_2_corrected)

actual data excerpt

Wavelength Background 1        1 Background 2       2 Background 3         3
       300     273290.0 337670.0     276740.0  397530     288500.0  367480.0
       301     299126.7 375143.3     299273.3  432250     310313.3  394796.7

I have read the lapply function would be used for this but i have never used it before, as I am quite the beginner in R. Help is much appreciated!

Ronak Shah · Accepted Answer · 2020-08-05T09:50:51.913

0

You can put the code in a function and apply it for each dataframe in list using lapply :

subtract_values <- function(df) {
  cols <- grep('^\\d+$', names(df), value = TRUE)
  new_cols <- paste0(cols, '_corrected')
  df[new_cols] <- df[cols] - df[paste0('Background ', cols)]
  df[c("Wavelength", new_cols)]
}

lapply(df_list, subtract_values)

#[[1]]
#  Wavelength 1_corrected 2_corrected
#1        300           6           8
#2        301           9           5

#[[2]]
#  Wavelength 1_corrected 2_corrected 3_corrected
#1        300           4           6           9
#2        301           9           5           7

edited Aug 05 '20 at 09:50

answered Aug 05 '20 at 09:03

Ronak Shah

377,200
20
156
213

Where did I mess up if I get the error: [.data.frame`(df, paste0("Background ", cols)) : undefined columns selected – Nuramon Aug 05 '20 at 09:12
I think you had the same error last time as well. Does it work for you for the test data that you have shared? Do you have the same column names as shown in your actual data? – Ronak Shah Aug 05 '20 at 09:33
It was not the same error, though I have figured out, that in the actual data the column names are not "Background_1" but rather "Background 1", with a whitespace. How do I denote that instead of 'Background_'? Just leaving a whitespace doesn't do the trick – Nuramon Aug 05 '20 at 09:49
It should if you really have column names as you say i.e "Background 1". I have also updated the answer. Also it is better not to have whitespaces in column names for exact same reason, it is difficult to select them. – Ronak Shah Aug 05 '20 at 09:56
I hate to sound unproductive, but it just doesn't work and I have only got it to work by replacing the whitespaces with _ in the data and use your original code – Nuramon Aug 05 '20 at 09:58
We can avoid all these conversation if you provide an example which is exactly as your real data because we can provide an answer only based on the data/description that you share. If I use this data `df_1 <- structure(list(Wavelength = 300:301, \`Background 1\` = c(5L, 3L ), \`1\` = 11:12, \`Background 2\` = 4:5, \`2\` = c(12L, 10L)), class = "data.frame", row.names = c(NA, -2L))` which has whitespace in the column names and then use `df_1[paste0('Background ', 1:2)]` it selects the columns properly and doesn't give the error. So I am not sure how exactly is your data different. – Ronak Shah Aug 05 '20 at 10:02
Please share your actual data using `dput` so I have exactly what you have. – Ronak Shah Aug 05 '20 at 10:15
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/219252/discussion-between-nuramon-and-ronak-shah). – Nuramon Aug 05 '20 at 10:18
as I have still not been able to figure out the issue, I am again turning to you. Could it be we are using different versions of R that is causing this problem for me? – Nuramon Aug 10 '20 at 09:33
I am not sure why do you get an error when it works for me. I don't think it is R version issue. Is it ok if you rename the columns and remove spaces from it? – Ronak Shah Aug 10 '20 at 09:42
If I manually change all whitespaces to underscores in Excel before loading into R, it works, if I change them with a function in R, it doesn't. I am a bit disappointed that it is that way, to be honest Regardless, thanks for the help – Nuramon Aug 10 '20 at 09:52
If you want to remove whitespace in R,after reading data into R, you can do`names(df) <- gsub('\\s', '', names(df))`. Also make sure to remove whitespace in the function for `Background` columns. `df[paste0('Background', cols)]`. Perhaps, maybe you have more than 1 whitespace in the column names? – Ronak Shah Aug 10 '20 at 09:55

Apply self-defined function on list of data frames in R

1 Answers1