2

I have to data frames, df_a and df_b with a different number of rows. Here is an example of my data structure.

df_a

id   ident   var
1    Test1   
2    Test1   
3    Test2   
4    Test1
5    Test3

df_b

id   ident   var
1    Test1   26
2    Test2   59

Now I want to copy the df_b$var to df_a$var, but only if the ident is matched.

The result needs to look like this:

df_a

id   ident   var
1    Test1   26
2    Test1   26
3    Test2   59
4    Test1   26
5    Test3   NA

I'm not quite sure how to do this - may someone help?

Sven Ritter
  • 25
  • 1
  • 6
  • 3
    http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right?lq=1 – erc Sep 29 '15 at 13:48

3 Answers3

3

Using your data:

#I have removed the var column as, 1) it is blank in your case
#and 2) it will be filled in any way 
df_a <- read.table(header=T, text='id   ident 
1    Test1   
2    Test1   
3    Test2   
4    Test1
5    Test3')

df_b <- read.table(header=T, text='id   ident   var
1    Test1   26
2    Test2   59')

This is in base R:

#df_a stays as it is since you need all the columns
#from df_b we just need indent for the link and var to be added
#the by argument indicates the "link" column
#all.x=TRUE states that all columns from df_a will be kept in the result
merge(df_a, df_b[c('ident','var')], by='ident', all.x=TRUE)

  ident id var
1 Test1  1  26
2 Test1  2  26
3 Test1  4  26
4 Test2  3  59
5 Test3  5  NA
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
2

We could use join from data.table. We convert the first 'data.frame' to 'data.table' (setDT(df_a)), join 'df_b' using on = 'ident'.

library(data.table)#v1.9.6+
setDT(df_a)[df_b, var := i.var, on = 'ident'][]
#   id ident var
#1:  1 Test1  26
#2:  2 Test1  26
#3:  3 Test2  59
#4:  4 Test1  26
#5:  5 Test3  NA

NOTE: In the above solution, I removed the empty 'var' column of 'df_a'.

EDIT: Based on @Aruns' comment.


Or we could use match from base R to get the numeric index and use that to get the corresponding 'var' from 'df_b'. This method will also work even if we have an empty 'var' column in 'df_a'.

df_a$var <- df_b$var[match(df_a$ident, df_b$ident)]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Using the dplyr package, it’s straightforward:

result = left_join(df_a, df_b, by = 'ident')

This will however, copy over some redundant columns. To clean these up, use:

result = select(result, id = id.x, ident, var = var.y)
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214