3

im going to explain to you my question on base of the sample data. Here is first table (df1):

  x x1 y  z
1 1 10 a 11
2 3 11 b 13
3 5 10 c 15
4 7 11 d 17
5 9 10 e 19

here is a dput() version:

structure(list(x = c(1, 3, 5, 7, 9), x1 = c(10, 11, 10, 11, 10
), y = structure(1:5, .Label = c("a", "b", "c", "d", "e"), class = "factor"), 
    z = c(11, 13, 15, 17, 19)), .Names = c("x", "x1", "y", "z"
), row.names = c(NA, -5L), class = "data.frame")

and second table (df2):

  x x1
1 2 10
2 3 60

dput():

structure(list(x = c(2, 3), x1 = c(10, 60)), .Names = c("x", 
"x1"), row.names = c(NA, -2L), class = "data.frame")

I need to now bind rows of these two tables and fill the missing column values with values from df1. Let me explain you on base of these two tables.

At first i use smartbind() function from gtools library:

library(gtools)
data <- smartbind(df1, df2)

And the result that i get looks like that:

 x x1    y  z
 1 10    a 11
 3 11    b 13
 5 10    c 15
 7 11    d 17
 9 10    e 19
 2 10 <NA> NA
 3 60 <NA> NA

So i would like to fill up the all NA values which appear in the rows from df2, with df1 values if the x is the same. In this case it would look like that:

 x x1    y  z
 1 10    a 11
 3 11    b 13
 5 10    c 15
 7 11    d 17
 9 10    e 19
 2 10 <NA> NA
 3 60    b 13

In my original dataset i do have around 280 columns! Thanks for help

Is there any more ELEGANT way to do it rather then joining two data frames and then using rbind()

Mal_a
  • 3,670
  • 1
  • 27
  • 60
  • Seems like a duplicate to me: first you have to merge by `x` and select all `df2` records and only then use `rbind`. [How to join (merge) data frames (inner, outer, left, right)?](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – pogibas Jul 16 '18 at 10:37
  • is there any more elegant way to do it? i knew i could do it with merging and then using rbind, but it doesnt seem much elegant and efficient if i have huge tables (columns > 280, rows > 10000) – Mal_a Jul 16 '18 at 10:42

2 Answers2

5

First you can merge missing columns of df2 from df1, only keeping extra columns( y and z, and the key column x from df1):

df2 = merge(df2,df1[,c("x","y","z")],by="x",all.x=T)

and then rbind df1 and df2:

> rbind(df1,df2)
  x x1    y  z
1 1 10    a 11
2 3 11    b 13
3 5 10    c 15
4 7 11    d 17
5 9 10    e 19
6 2 10 <NA> NA
7 3 60    b 13
tyumru
  • 417
  • 3
  • 13
  • Just use `df1[, -2]` – pogibas Jul 16 '18 at 10:40
  • Thanks for the answer, however i am looking for more elegant way to do it, maaybe with data.table? – Mal_a Jul 16 '18 at 10:44
  • @PoGibas You're right but keeping it like this since column names are more comprehensible for demo purposes. – tyumru Jul 16 '18 at 10:46
  • @Mal_a you can wrap up the above process in a function to your taste and come up with your own elegant solution :) (PS: I guess columns > 280, rows > 10000 is no big deal for merge and rbind) – tyumru Jul 16 '18 at 10:54
1

Or using tidyverse

library(tidyverse)
df1 %>% 
   select(-x1) %>% 
   right_join(df2) %>%
   bind_rows(df1, .)
#  x x1    y  z
#1 1 10    a 11
#2 3 11    b 13
#3 5 10    c 15
#4 7 11    d 17
#5 9 10    e 19
#6 2 10 <NA> NA
#7 3 60    b 13

Or with data.table

nm1 <- setdiff(names(df1), c('x', 'x1'))
setDT(df2)[df1, (nm1) := mget(nm1), on = .(x)]
rbind(df1, df2)
#   x x1    y  z
#1: 1 10    a 11
#2: 3 11    b 13
#3: 5 10    c 15
#4: 7 11    d 17
#5: 9 10    e 19
#6: 2 10 <NA> NA
#7: 3 60    b 13
akrun
  • 874,273
  • 37
  • 540
  • 662