rbind dataframes with a different column name

Question

I've 12 data frames, each one contains 6 columns: 5 have the same name, 1 is different. Then when I call rbind() I get:

Error in match.names(clabs, names(xi)) : 
  names do not match previous names

The column that differs is: "goal1Completions". There are 12 goalsCompletions... they are: "goal1Completions", "goal2Completions", "goal3Completions"... and so on.

The best way I can think of is: renaming every column in every data frame to "GoalsCompletions" and then using "rbind()".

Is there a simpler way?

Look on Google O found this package: "gtools". It has a function called: "smartbind". However, after using smartbind() i want to see the the data frame with "View()", my R session crashes...

My data (an example of the first data frame):

       date      source     medium   campaign   goal1Completions    ad.cost           Goal
1   2014-10-01  (direct)    (none)   (not set)          0           0.0000            Vida
2   2014-10-01   Master      email     CAFRE            0           0.0000            Vida
3   2014-10-01  apeseg      referral (not set)          0           0.0000            Vida

Do these 12 dataset objects have some name patterns i.e. `df1, df2, df3,...etc` It may be better to put them in a list and then do rbindlist ie. `rbindlist(mget(paste0('df',1:12)))` — akrun, Feb 17 '15 at 17:27
@akrun, yes the pattern is: `Goal1_Costo,Goal2_Costo,... Goal12_Costo`. If you need to update your answer, please do. — Omar Gonzales, Feb 17 '15 at 17:37

LyzandeR · Answer 1 · 2015-02-17T17:14:20.550

26

My favourite use of mapply:

Example Data

a <- data.frame(a=runif(5), b=runif(5))
> a
          a         b
1 0.8403348 0.1579255
2 0.4759767 0.8182902
3 0.8091875 0.1080651
4 0.9846333 0.7035959
5 0.2153991 0.8744136

and b

b <- data.frame(c=runif(5), d=runif(5))
> b
          c         d
1 0.7604137 0.9753853
2 0.7553924 0.1210260
3 0.7315970 0.6196829
4 0.5619395 0.1120331
5 0.5711995 0.7252631

Solution

Using mapply:

> mapply(c, a,b)    #or as.data.frame(mapply(c, a,b)) for a data.frame
              a         b
 [1,] 0.8403348 0.1579255
 [2,] 0.4759767 0.8182902
 [3,] 0.8091875 0.1080651
 [4,] 0.9846333 0.7035959
 [5,] 0.2153991 0.8744136
 [6,] 0.7604137 0.9753853
 [7,] 0.7553924 0.1210260
 [8,] 0.7315970 0.6196829
 [9,] 0.5619395 0.1120331
[10,] 0.5711995 0.7252631

And based on @Marat's comment below:

You can also do data.frame(mapply(c, a, b, SIMPLIFY=FALSE)) or, alternatively, data.frame(Map(c,a,b)) to avoid double data.frame-matrix conversion

edited Feb 17 '15 at 17:14

answered Feb 17 '15 at 16:58

LyzandeR

37,047
12
77
87

it seems very clever. The "c" in mapply(c,a,b) is for concatenate? It concatenates "a","b" and keeps the column names from "a"? – Omar Gonzales Feb 17 '15 at 17:03
4

You could avoid double data.frame-matrix conversion by `data.frame(mapply(c, a, b, SIMPLIFY=FALSE))` or, alternatively, `data.frame(Map(c,a,b))` – Marat Talipov Feb 17 '15 at 17:03
@OmarGonzales Yes it is the usual concatenate function and it does keep the column names from a. Each time it concatenates the elements (i.e. columns) of the two data.frames and returns a matrix in the end. – LyzandeR Feb 17 '15 at 17:09
This can be dangerous since it will combine data frames with different column dimensions. Would have been perfect though. I'm sure a simple if statement would do though. – MadmanLee Apr 24 '19 at 05:00
Very late to the party but `purrr::map2_df(a, b, c)` will work without having to wrap in a `data.frame`, although I don't know if it's avoiding the double conversion internally. And, like @MaratTalipov's answer, will keep the type of the first df, whereas mapply coerces (in my case to all character when mixing dbl or date and chr columns). – Mooks Feb 05 '21 at 13:39

akrun · Accepted Answer · 2015-02-17T17:40:34.800

17

You could use rbindlist which takes different column names. Using @LyzandeR's data

library(data.table) #data.table_1.9.5
rbindlist(list(a,b))
#            a         b
# 1: 0.8403348 0.1579255
# 2: 0.4759767 0.8182902
# 3: 0.8091875 0.1080651
# 4: 0.9846333 0.7035959
# 5: 0.2153991 0.8744136
# 6: 0.7604137 0.9753853
# 7: 0.7553924 0.1210260
# 8: 0.7315970 0.6196829
# 9: 0.5619395 0.1120331
#10: 0.5711995 0.7252631

Update

Based on the object names of the 12 datasets (i.e. 'Goal1_Costo', 'Goal2_Costo',..., 'Goal12_Costo'),

 nm1 <- paste(paste0('Goal', 1:12), 'Costo', sep="_")
 #or using `sprintf`
 #nm1 <- sprintf('%s%d_%s', 'Goal', 1:12, 'Costo')
 rbindlist(mget(nm1))

edited Feb 17 '15 at 17:40

answered Feb 17 '15 at 17:06

akrun

874,273
37
540
662

2

dplyr has not a similar function? I'm lookig for it, if somene knows please post. – Omar Gonzales Feb 17 '15 at 17:23
2

@OmarGonzales It has `bind_rows`, but still the column names will be a problem. So, instead of 2 columns, the output will be 4. According to `?bind_rows` `When row-binding, columns are matched by name, and any values that don't match will be filled with NA.` – akrun Feb 17 '15 at 17:24
Thanks to all, but i ended using this, as this seems more simplier. However, i'll need to investigate a little more on the mapplay functions...seems very powerfull. – Omar Gonzales Feb 17 '15 at 17:31
1

@OmarGonzales One advantage of using `rbindlist` is its speed. – akrun Feb 17 '15 at 17:32

score 7 · Answer 3 · answered Feb 17 '15 at 16:55

I would rename the columns. This is very easy with names() if the columns are in the same order.

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

names(df2)<-names(df1)

rbind(df1,df2)

or

df1 <- data.frame(one=1:10,two=11:20,three=21:30)

df2 <- data.frame(four=31:40,five=41:50,six=51:60)

rbind(df1,setnames(df2,names(df1)))

Result:

   one two three
1    1  11    21
2    2  12    22
3    3  13    23
4    4  14    24
5    5  15    25
6    6  16    26
7    7  17    27
8    8  18    28
9    9  19    29
10  10  20    30
11  31  41    51
12  32  42    52
13  33  43    53
14  34  44    54
15  35  45    55
16  36  46    56
17  37  47    57
18  38  48    58
19  39  49    59
20  40  50    60

OP mentioned about 12 datasets. So probably, `df3 <- data.frame(seven=61:70,eight=71:80,nine=81:90);res <- do.call(rbind,lapply(mget(paste0('df',1:3)), function(x) {colnames(x) <- colnames(df1);x})); row.names(res) <- NULL` — akrun, Feb 17 '15 at 17:17

score 1 · Answer 4 · answered Sep 02 '20 at 22:10

Another base R approach if you have data.frames with different column names:

# Create a list of data frames
df_list <- list()
df_list[[1]] <- data.frame(x = 1, y = paste0("y1", 1:3))
df_list[[2]] <- data.frame(x = 2, y = paste0("y2", 1:4))
df_list[[3]] <- data.frame(x = 3, y = paste0("y3", 1:5), z = "z3")
df_list
#> [[1]]
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 
#> [[2]]
#>   x   y
#> 1 2 y21
#> 2 2 y22
#> 3 2 y23
#> 4 2 y24
#> 
#> [[3]]
#>   x   y  z
#> 1 3 y31 z3
#> 2 3 y32 z3
#> 3 3 y33 z3
#> 4 3 y34 z3
#> 5 3 y35 z3

# This works when the column names are the same
do.call(rbind, df_list[1:2])
#>   x   y
#> 1 1 y11
#> 2 1 y12
#> 3 1 y13
#> 4 2 y21
#> 5 2 y22
#> 6 2 y23
#> 7 2 y24

# but fails when the column names differ
do.call(rbind, df_list)
#> Error in rbind(deparse.level, ...): numbers of columns of arguments do not match

# This can fill the unmatched columns with NA's without 
# depending on other packages:
Reduce(rbind, Map(function(x) {
  x[, setdiff(unique(unlist(lapply(df_list, colnames))), names(x))] <- NA; 
  return(x)
  }, 
  df_list))
#>    x   y    z
#> 1  1 y11 <NA>
#> 2  1 y12 <NA>
#> 3  1 y13 <NA>
#> 4  2 y21 <NA>
#> 5  2 y22 <NA>
#> 6  2 y23 <NA>
#> 7  2 y24 <NA>
#> 8  3 y31   z3
#> 9  3 y32   z3
#> 10 3 y33   z3
#> 11 3 y34   z3
#> 12 3 y35   z3

score 0 · Answer 5 · answered Apr 27 '19 at 01:46

Here is a possible tidyverse solution. I created 3 example dataframes based on your description of your dataframes.

df1 <- read.table(text ="date,source,medium,campaign,goal1Completions,ad.cost,Goal
2014-10-01,(direct),(none),(notset),1,0.0000,Vida
2014-10-01,Master,email,CAFRE,2,0.0000,Vida
2014-10-01,apeseg,referral,(not set),3,0.0000,vida",sep = ",",header=TRUE) 

df2 <- read.table(text ="date,source,medium,campaign,goal2Completions,ad.cost,Goal
2014-10-01,(direct),(none),(notset),4,0.0000,Vida
2014-10-01,Master,email,CAFRE,5,0.0000,Vida
2014-10-01,apeseg,referral,(not set),6,0.0000,vida",sep = ",",header=TRUE) 

df3 <- read.table(text ="date,source,medium,campaign,goal3Completions,ad.cost,Goal
2014-10-01,(direct),(none),(notset),7,0.0000,Vida
2014-10-01,Master,email,CAFRE,8,0.0000,Vida
2014-10-01,apeseg,referral,(not set),9,0.0000,vida",sep = ",",header=TRUE) 

> df1
        date   source   medium  campaign goal1Completions ad.cost Goal
1 2014-10-01 (direct)   (none)  (notset)                1       0 Vida
2 2014-10-01   Master    email     CAFRE                2       0 Vida
3 2014-10-01   apeseg referral (not set)                3       0 vida
> df2
        date   source   medium  campaign goal2Completions ad.cost Goal
1 2014-10-01 (direct)   (none)  (notset)                4       0 Vida
2 2014-10-01   Master    email     CAFRE                5       0 Vida
3 2014-10-01   apeseg referral (not set)                6       0 vida
> df3
        date   source   medium  campaign goal3Completions ad.cost Goal
1 2014-10-01 (direct)   (none)  (notset)                7       0 Vida
2 2014-10-01   Master    email     CAFRE                8       0 Vida
3 2014-10-01   apeseg referral (not set)                9       0 vida

library(dplyr)
library(tidyselect)
library(purrr)

bind_rows(df1,df2,df3) %>%
   mutate(goalCompletions = reduce(select_at(.,vars(matches("goal[[:digit:]]+Completions"))),coalesce)) %>%
   select_at(vars(-matches("goal[[:digit:]]+Completions")))

        date   source   medium  campaign ad.cost Goal goalCompletions
1 2014-10-01 (direct)   (none)  (notset)       0 Vida               1
2 2014-10-01   Master    email     CAFRE       0 Vida               2
3 2014-10-01   apeseg referral (not set)       0 vida               3
4 2014-10-01 (direct)   (none)  (notset)       0 Vida               4
5 2014-10-01   Master    email     CAFRE       0 Vida               5
6 2014-10-01   apeseg referral (not set)       0 vida               6
7 2014-10-01 (direct)   (none)  (notset)       0 Vida               7
8 2014-10-01   Master    email     CAFRE       0 Vida               8
9 2014-10-01   apeseg referral (not set)       0 vida               9

rbind dataframes with a different column name

5 Answers5

Update

Linked

Related