0

Given df1 as

df1 = data.frame(a = c(1,2,2), 
b = c(2, 2, 3), c = c(3, 2, 1), 
label = c("abc", "bbb", "bca"))

> df1
  a b c label
1 1 2 3   abc
2 2 2 2   bbb
3 2 3 1   bca

and df2 as

df2 = data.frame(a = c(2, 2, 1, 1, 1, 2), b = c(2, 2, 2, 2, 2, 3), 
c = c(2, 2, 3, 3, 3, 1), d = c(0.1, 0.2, 0.8, 0.7, 0.6, 0.9))

> df2
  a b c   d
1 2 2 2 0.1
2 2 2 2 0.2
3 1 2 3 0.8
4 1 2 3 0.7
5 1 2 3 0.6
6 2 3 1 0.9

What is the best way to merge them in order to get df3 as following?

df3 = data.frame(a = c(2, 2, 1, 1, 1, 2), b = c(2, 2, 2, 2, 2, 3), 
c = c(2, 2, 3, 3, 3, 1), d = c(0.1, 0.2, 0.8, 0.7, 0.6, 0.9), 
label = c("bbb", "bbb", "abc", "abc", "abc", "bca"))

> df3
  a b c   d label
1 2 2 2 0.1   bbb
2 2 2 2 0.2   bbb
3 1 2 3 0.8   abc
4 1 2 3 0.7   abc
5 1 2 3 0.6   abc
6 2 3 1 0.9   bca

I want the output to be the same size as df2 with all its attributes and values, plus an additional label column from df1 which indicates the unique combination group of a, b, c attributes.

I tried the basic merge function as

df4 = merge(df1, df2)

and it works fine for my sample data frames (like this example here), but when I want to apply this logic to larger data sets it doesn't return the output I expect. Specially the size of out put is not the same size as df2, but smaller size.

I was wondering if there is any alternative way to do this, or any idea about what can goes wrong applying the same logic (which seems to work fine in small data frames) in large data sets. Thanks!

Rotail
  • 1,025
  • 4
  • 19
  • 40
  • 1
    It's probably a smaller size because there are some (a, b, c) combinations that appear in `df2` but not `df1`. If you want to keep all the rows from `df2`, just add `all.y=TRUE` as an argument to `merge`. – josliber Sep 14 '17 at 17:36
  • Please see the marked duplicate -- you are looking for a right outer join. – josliber Sep 14 '17 at 17:37

0 Answers0