0

I have several data frames in R with the following structure

> df1
 messy_col_name1  messy_group_name1
 numeric data     "group1"
 ...              ...
 numeric data     "group1"

> df2
 messy_col_name2  messy_group_name2
 numeric data     "group2"
 ...              ...
 numeric data     "group2"
 .
 .
 .
> dfN
 messy_col_nameN  messy_group_nameN
 numeric data     "groupN"
 ...              ...
 numeric data     "groupN"

All of these data frames have 2 columns. The first column has real values, the second column is a string of the group name (factor).

I was wondering whether there is an efficient way to bind these data frames by row without relabelling the column names on each data frame. The final object should also be a data frame. The aim is to perform an ANOVA using aov(). The end result should appear like this:

> df.combined
 col_name      group
 numeric_data  "group1"
 ...           ...
 numeric_data  "group1"
 numeric_data  "group2"
 ...           ...
 numeric_data  "group2"
 ...           ...
 numeric_data  "groupN"
 ...           ...
 numeric_data  "groupN"

I was not successful using common functions like rbind(), rbind.fill() or bind_rows().

I examined the following posts however I was not able to solve this issue:

Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?

R: rbind data frames with a different column name

The following post came close:

How to rbind different data frames with different column names?

however the solution in this post is not efficient when there are many data frames.

NM_
  • 1,887
  • 3
  • 12
  • 27

1 Answers1

1

Binding data frames by row does require that they have the same column names. Relabelling per data frame is likely as efficient as any other solution.

I would make a list of data frames; this allows the use of lapply to rename the columns. Then you can use do.call(rbind) or dplyr::bind_rows().

For example:

library(magrittr) # for the pipes
df.combined <- list(df1, df2, df3) %>% 
  lapply(., function(x) setNames(x, c("col_name", "group"))) %>% 
  do.call(rbind, .)

Or using dplyr:

library(dplyr)
df.combined <- list(df1, df2, df3) %>% 
  lapply(., function(x) setNames(x, c("col_name", "group"))) %>% 
  bind_rows()

I would bet that there is also an elegant solution using one of the mapping functions in the purrr package.

neilfws
  • 32,751
  • 5
  • 50
  • 63