169

Given two dataframes a and b:

> a
           a           b           c
1 -0.2246894 -1.48167912 -1.65099363
2  0.5559320 -0.87898575 -0.15634590
3  1.8469466 -0.01487524 -0.53098215
4 -0.6875051  0.23880967  0.01824621
5 -0.6735163  0.75485292  0.44154092


> b
           a          c
1  0.4287284 -0.3295925
2  0.5201492  0.3341251
3 -2.6355570  1.7916780
4 -1.3645337  1.3642276
5 -0.4954542 -0.6660001

Is there a simple way to concatenate these so as to return a new data frame of the form below?

> new
           a                   b           c
1  -0.2246894   -1.48167912106676 -1.65099363
2   0.5559320  -0.878985746842256 -0.15634590
3   1.8469466 -0.0148752354840942 -0.53098215
4  -0.6875051   0.238809666690982  0.01824621
5  -0.6735163   0.754852923524198  0.44154092
6   0.4287284                  NA -0.32959248
7   0.5201492                  NA  0.33412510
8  -2.6355570                  NA  1.79167801
9  -1.3645337                  NA  1.36422764
10 -0.4954542                  NA -0.66600006

I want to merge the dataframes, match the headers and insert NA in for positions in dataframe b where the header is missing.

dfrankow
  • 20,191
  • 41
  • 152
  • 214
Darren J. Fitzpatrick
  • 7,159
  • 14
  • 45
  • 49

5 Answers5

270

You want "rbind".

b$b <- NA
new <- rbind(a, b)

rbind requires the data frames to have the same columns.

The first line adds column b to data frame b.

Results

> a <- data.frame(a=c(0,1,2), b=c(3,4,5), c=c(6,7,8))
> a
  a b c
1 0 3 6
2 1 4 7
3 2 5 8
> b <- data.frame(a=c(9,10,11), c=c(12,13,14))
> b
   a  c
1  9 12
2 10 13
3 11 14
> b$b <- NA
> b
   a  c  b
1  9 12 NA
2 10 13 NA
3 11 14 NA
> new <- rbind(a,b)
> new
   a  b  c
1  0  3  6
2  1  4  7
3  2  5  8
4  9 NA 12
5 10 NA 13
6 11 NA 14
dfrankow
  • 20,191
  • 41
  • 152
  • 214
  • 14
    If you're getting the union of more than 2 data frames, you can use `Reduce(rbind, list_of_data_frames)` to mash them all together! – Yourpalal Aug 13 '15 at 21:12
  • 1
    if you're `rbind` is coming from base for some strange reason: I used `rbind.data.frame` – Boern May 02 '18 at 12:42
37

you can use the function

bind_rows(a,b)

from the dplyr library

  • 3
    Unlike `cbind` (`rbind`), this function does not change the type of all the columns (rows) to `factor` if a vector of characters is present. – Azim Apr 12 '18 at 14:40
36

Try the plyr package:

rbind.fill(a,b,c)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Rnoob
  • 1,013
  • 1
  • 11
  • 12
  • 11
    Avoid using external packages for simple tasks. – Fernando Jan 21 '16 at 00:18
  • 30
    Clearer and easier than hacking in extra columns just to please rbind; this is the right way forward. Avoiding extremely common packages like `plyr` when it offers the right tools for the job is simply not sensible. – Jack Aidley Jun 05 '17 at 18:24
  • 2
    This function automatically do the factor merging. It's significantly better than the accepted answer. `plyr` is an awful common package. – ABCD Nov 28 '17 at 05:17
14

Here's a simple little function that will rbind two datasets together after auto-detecting what columns are missing from each and adding them with all NAs.

For whatever reason this returns MUCH faster on larger datasets than using the merge function.

fastmerge <- function(d1, d2) {
  d1.names <- names(d1)
  d2.names <- names(d2)

  # columns in d1 but not in d2
  d2.add <- setdiff(d1.names, d2.names)

  # columns in d2 but not in d1
  d1.add <- setdiff(d2.names, d1.names)

  # add blank columns to d2
  if(length(d2.add) > 0) {
    for(i in 1:length(d2.add)) {
      d2[d2.add[i]] <- NA
    }
  }

  # add blank columns to d1
  if(length(d1.add) > 0) {
    for(i in 1:length(d1.add)) {
      d1[d1.add[i]] <- NA
    }
  }

  return(rbind(d1, d2))
}
Mike Monteiro
  • 1,427
  • 1
  • 14
  • 21
  • 2
    This little function is dynamite. – Dirk Jul 10 '17 at 13:12
  • Nice. I just wanted to post the same answer :-) . One improvement: @Anton casted the `NA` to `double` in his answer. It would be nice when the type of the new column was the same type as the existing column in the other data frame. Maybe via `mode(d2[d2.add[i]]) <- mode(d1[d2.add[i]])`. But I am not sure whether this is the appropriate way. – daniel.heydebreck Aug 09 '17 at 11:12
5

You may use rbind but in this case you need to have the same number of columns in both tables, so try the following:

b$b<-as.double(NA) #keeping numeric format is essential for further calculations
new<-rbind(a,b)
mdml
  • 22,442
  • 8
  • 58
  • 66
Anton
  • 69
  • 1
  • 1