0

I am working with the R programming language.

I have the following 3 data frames:

df_1 = data.frame(col1 = c("A", "B", "C"), col2 = c(2,4,6), col3 = c(5, "B", "F"))
df_2 = data.frame(col4 = c(5,6,7), col5 = c("A", "D", "Z"))
df_3 = data.frame(col6 = c("dog", "cat"), col7 = c("bear", "wolf"), col8 = c("lion", "tiger"), col9 = c("horse", "pig"), col10 = c("shark", "whale"))

I want to take this list and make one big data frame that will look like this:

#   new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
# 1         A         2         5      <NA>      <NA>
# 2         B         4         B      <NA>      <NA>
# 3         C         6         F      <NA>      <NA>
# 4         5         A      <NA>      <NA>      <NA>
# 5         6         D      <NA>      <NA>      <NA>
# 6         7         Z      <NA>      <NA>      <NA>
# 7       dog      bear      lion     horse     shark
# 8       cat      wolf     tiger       pig     whale

Based on the answer I got in a previous question (R: Combining Data Frames With Different Column Names and Numbers of COlumns), I am able to accomplish this with the following code:

frames = list(df_1, df_2, df_3)


frames2 <- lapply(frames, function(z) {
    z <- setNames(z, paste0("new_col_", seq_along(z)))
    z[] <- lapply(z, as.character)
    z
})
out <- bind_rows(frames2)

My Question: In reality, I have many files that I am trying to use this code for (these files are already loaded into R): df_1, df_2, df_3.....df_500 - however, when I use the above code, I think there might be something in one of these files that prevents this code from running and creates an error:

# DOES NOT RUN
frames = list(df_1, df_2, df_3, ...df_500)

 frames2 <- lapply(frames, function(z) {
     z <- setNames(z, paste0("new_col_", seq_along(z)))
     z[] <- lapply(z, as.character)
     z
 })

 Error in names(object) <- nm : attempt to set an attribute on NULL 

I tried to manually see if I can debug this code - for instance, I noticed that I can add more df's to the frame and the code still runs:

# WORKS
frames = list(df_1, df_2, df_3, df_4, df_5, df_6)


frames2 <- lapply(frames, function(z) {
    z <- setNames(z, paste0("new_col_", seq_along(z)))
    z[] <- lapply(z, as.character)
    z
})
out <- bind_rows(frames2)

As such, I was wondering if there is some way to add a tryCatch style function to the above code so that when a file is encountered that is causing a problem, the code might still be able to run (while keeping note of which file caused the problem)? Or in general, is there a different way to target this problem?

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

2

It is not super clear to me what you are trying to do (especially without a repex error) but you could try this:

df_1 = data.frame(col1 = c("A", "B", "C"), col2 = c(2,4,6), col3 = c(5, "B", "F"))
df_2 = data.frame(col4 = c(5,6,7), col5 = c("A", "D", "Z"))
df_3 = data.frame(col6 = c("dog", "cat"), col7 = c("bear", "wolf"), col8 = c("lion", "tiger"), col9 = c("horse", "pig"), col10 = c("shark", "whale"))

frames = list(df_1, df_2, df_3)

function_to_try <- function(z) {
  z <- setNames(z, paste0("new_col_", seq_along(z)))
  z[] <- lapply(z, as.character)
  z
}


frames2 <- lapply(
  frames, 
  \(x) tryCatch(function_to_try(x), error = function(e) NA)
  )
out <- dplyr::bind_rows(frames2)
out
#>   new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
#> 1         A         2         5      <NA>      <NA>
#> 2         B         4         B      <NA>      <NA>
#> 3         C         6         F      <NA>      <NA>
#> 4         5         A      <NA>      <NA>      <NA>
#> 5         6         D      <NA>      <NA>      <NA>
#> 6         7         Z      <NA>      <NA>      <NA>
#> 7       dog      bear      lion     horse     shark
#> 8       cat      wolf     tiger       pig     whale

Created on 2023-05-04 with reprex v2.0.2

You could possibly need to replace error = function(e) NA with error = function(e) NULL. At least you should be able to identify what files are not working with which(is.na(frames2)).

Of course, you can then not include them when binding:

out <- dplyr::bind_rows(frames2[!is.na(frames2)])
Baraliuh
  • 2,009
  • 5
  • 11