0

I am working with the R programming language.

I am starting with these 3 data frames - these frames have different numbers of columns and different column names:

df_1 = data.frame(col1 = c("A", "B", "C"), col2 = c(2,4,6), col3 = c(5, "B", "F"))
df_2 = data.frame(col4 = c(5,6,7), col5 = c("A", "D", "Z"))
df_3 = data.frame(col6 = c("dog", "cat"), col7 = c("bear", "wolf"), col8 = c("lion", "tiger"), col9 = c("horse", "pig"), col10 = c("shark", "whale"))

My Question: Using these 3 data frames, I am trying to get the following output ("final") where columns from 3 data frames are stacked on top of each other based on their positions :

 final
  new_col_1 new_col_2 new_col_3 new_col_4 new_col_5 new_col_6
1         A         2         5      <NA>      <NA>      <NA>
2         B         4         B      <NA>      <NA>      <NA>
3         C         6         F      <NA>      <NA>      <NA>
4         5         A      <NA>      <NA>      <NA>      <NA>
5         6         D      <NA>      <NA>      <NA>      <NA>
6         7         Z      <NA>      <NA>      <NA>      <NA>
7       dog      bear      <NA>      lion     horse     shark
8       cat      wolf      <NA>     tiger       pig     whale

I tried to use the advice from this question here (Combine two data frames by rows (rbind) when they have different sets of columns) to solve my problem :

library(gtools)
smartbind(df_1, df_2, df_3)

    col1 col2 col3 col4 col5 col6 col7  col8  col9 col10
1:1    A    2    5   NA <NA> <NA> <NA>  <NA>  <NA>  <NA>
1:2    B    4    B   NA <NA> <NA> <NA>  <NA>  <NA>  <NA>
1:3    C    6    F   NA <NA> <NA> <NA>  <NA>  <NA>  <NA>
2:1 <NA>   NA <NA>    5    A <NA> <NA>  <NA>  <NA>  <NA>
2:2 <NA>   NA <NA>    6    D <NA> <NA>  <NA>  <NA>  <NA>
2:3 <NA>   NA <NA>    7    Z <NA> <NA>  <NA>  <NA>  <NA>
3:1 <NA>   NA <NA>   NA <NA>  dog bear  lion horse shark
3:2 <NA>   NA <NA>   NA <NA>  cat wolf tiger   pig whale

However, this code is not giving me the desired results - can someone please tell me what I am doing wrong and how I can fix this?

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • how do you mean "based on their positions"? Your addition of `df_3`, for instance, is inconsistent. If you go by "n-th column" (discarding the names), then the last values of `new_col_3` should never be `NA`; if you mean by column name, then ... the actual output where everything is a new column is correct, predictable, and justifiable. – r2evans May 04 '23 at 17:11
  • @r2evans: thank you for pointing this out ... i will fix my question – stats_noob May 04 '23 at 17:15

3 Answers3

2

Just as others have said, you need to standardize the column names and data types. One restriction of a data frame is that every element of a column must be the same data type. Here's a slightly different solution if you prefer:

library(dplyr)
library(purrr)
map_dfr(list(df_1,df_2,df_3),
        ~ {
          
          setNames(.,paste0("new_col_",1:ncol(.))) %>%
            mutate(across(everything(),as.character))
          
        })
Joe Robert
  • 84
  • 3
1

If you wish to have the table in the order that you firstly specified, then it is a matter of specifying the column names accordingly - I do not understand why each column is being given a separate name when you wish to have several values under the same column.

df_1 = data.frame(col1 = c("A", "B", "C"), col2 = c(2,4,6), col3 = c(5, "B", "F"))
df_2 = data.frame(col1 = c(5,6,7), col2 = c("A", "D", "Z"))
df_3 = data.frame(col1 = c("dog", "cat"), col2 = c("bear", "wolf"), col4 = c("lion", "tiger"), col5 = c("horse", "pig"), col6 = c("shark", "whale"))

library(gtools)
smartbind(df_1, df_2, df_3)

The columns are named in accordance to the "positions" that you would like each value to appear in, and running the above produces your desired result:

    col1 col2 col3  col4  col5  col6
1:1    A    2    5  <NA>  <NA>  <NA>
1:2    B    4    B  <NA>  <NA>  <NA>
1:3    C    6    F  <NA>  <NA>  <NA>
2:1    5    A <NA>  <NA>  <NA>  <NA>
2:2    6    D <NA>  <NA>  <NA>  <NA>
2:3    7    Z <NA>  <NA>  <NA>  <NA>
3:1  dog bear <NA>  lion horse shark
3:2  cat wolf <NA> tiger   pig whale
Michael Grogan
  • 973
  • 5
  • 10
1

Two problems:

  1. The names are different. Easily remedied, since you don't care about any of the original column names.
  2. The column classes are different. Easily remedied, we will convert everything to character. I don't think this is fully always desired, you can always use type.convert(., as.is=TRUE) as needed.

With this, we can use either dplyr::bind_rows or data.table::rbindlist to combine them.

frames <- list(df_1, df_2, df_3)
frames2 <- lapply(frames, function(z) {
  z <- setNames(z, paste0("new_col_", seq_along(z)))
  z[] <- lapply(z, as.character)
  z
})
out <- bind_rows(frames2)
out
#   new_col_1 new_col_2 new_col_3 new_col_4 new_col_5
# 1         A         2         5      <NA>      <NA>
# 2         B         4         B      <NA>      <NA>
# 3         C         6         F      <NA>      <NA>
# 4         5         A      <NA>      <NA>      <NA>
# 5         6         D      <NA>      <NA>      <NA>
# 6         7         Z      <NA>      <NA>      <NA>
# 7       dog      bear      lion     horse     shark
# 8       cat      wolf     tiger       pig     whale

### similarly
out <- rbindlist(frames2, use.names = TRUE, fill = TRUE)

If you want whatever might be "number-like" to return to this number-like format (will do nothing with this data), you can run:

out[] <- lapply(out, type.convert, as.is = TRUE)
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • @ R2evans: thank you so much for your answer! I am working on a follow-up question here - can you please take a look at if you have time? https://stackoverflow.com/questions/76175678/r-combining-data-frames-with-different-column-names-and-numbers-of-columns thank you so much! – stats_noob May 04 '23 at 20:56