0

I am trying to merge a 100-column dataset with roughly 5k rows (df1) with a single column of another dataset (df2$X), based on the ID column.

E.g.

df1:

Id    Price   Animals 
2     1       Dog
3     1       Cat
4     3       Dog
5     4       Cat
6     5       Mouse
10    9       Cat
16    0       Cat
14    4       Cat

df2: 
Id    X   
2     Kiwi      
5     Banana    
6     Apple      
7     Berries      
8     Mango      
10    Melon  

The resulting dataset should be:


Id    Price   Animals  X
2     1       Dog      Kiwi
3     1       Cat      NA
4     3       Dog      NA
5     4       Cat      Banana
6     5       Mouse    Apple
10    9       Cat      Melon
16    0       Cat      NA
14    4       Cat      NA

I tried doing:

merge(df1, df2, by="Id")

but this seems not to be working.

Thanks!

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66

2 Answers2

1
library(tidyr)
library(dplyr)
df1 %>% left_join(df2)
Joining, by = "Id"
# A tibble: 8 x 4
     Id Price Animals X     
  <dbl> <dbl> <chr>   <chr> 
1     2     1 Dog     Kiwi  
2     3     1 Cat     NA    
3     4     3 Dog     NA    
4     5     4 Cat     Banana
5     6     5 Mouse   Apple 
6    10     9 Cat     Melon 
7    16     0 Cat     NA    
8    14     4 Cat     NA    
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
1

If you have duplicate columns in both the dataframe you can select only the columns that you need before merge.

df3 <- merge(df1, df2[c('Id', 'X')], by= "Id", all.x= TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213