0

I have three data frames. One 'main' data frame to which I want to merge the other two sub data frames.

main = data.frame(a = LETTERS[1:5], b = round(rnorm(5),2))

a     b
A  0.41
B -1.06
C -0.65
D -1.27
E  1.03

sub1 = data.frame(a = LETTERS[2:4], c = sample(c(100, 200, 300), 3, replace=T))

a   c
B 200
C 200
D 200

sub2 = data.frame(a = LETTERS[c(1,5)], c = sample(c(999, 888), 2, replace=T))

a   c
A 999
E 888

In the first merge, I want to merge sub1 to main via "a" and this works.

merged1 = merge(main, sub1, by="a", all.x=T)

a     b   c
A  0.41  NA
B -1.06 300
C -0.65 300
D -1.27 100
E  1.03  NA

Now I want to join/merge the sub2 to merged1. In my case, the values provided by sub2 substitute the NAs in column c. Specifically I want to replace these NAs with the values from sub2. I tried this (and other arguments for all.x, all.y:

merge(merged1, sub2, by="a", all.x=T)

a     b c.x c.y
A  0.41  NA 999
B -1.06 300  NA
C -0.65 300  NA
D -1.27 100  NA
E  1.03  NA 999

How can I get this to be only one column c with the merge?

ben_aaron
  • 1,504
  • 2
  • 19
  • 39
  • 2
    Could you please 1) use set a seed number when generating these data.frames so that they're reproducible. 2) be consistent in naming conventions...you create data frames "sub1" and "main" but then use "df_main" and "df_sub1" in the merge. 3) check your example for accuracy. It appears df_sub1 and sub1 are different since the latter contains 300 but the former contains only 200. – AOGSTA Apr 27 '16 at 21:42
  • Just `rbind(df_sub1, df_sub2)` and merge to that instead of the individual little merges. You may need to convert the factor column to character to successfully `rbind`. – Gregor Thomas Apr 27 '16 at 21:55
  • I think the only other option is to post-process with a [coalesce-like function](http://stackoverflow.com/q/19253820/903061). – Gregor Thomas Apr 27 '16 at 21:58
  • Perhaps `match` is also an option: `merged1$c[ match(sub2$a, merged1$a)] <- sub2$c`. (Note that your example is a little inconsistent... `df_sub1`, `sub1`, ...) – lukeA Apr 27 '16 at 22:01

1 Answers1

4

Note: My seed was set, so the results are different than those in the question.

If you bind your sub1 or sub2 together, you can do it all in one merge.

set.seed(42)

main = data.frame(a = LETTERS[1:5], b = round(rnorm(5),2))

sub1 = data.frame(a = LETTERS[2:4], c = sample(c(100, 200, 300), 3, replace=T))

sub2 = data.frame(a = LETTERS[c(1,5)], c = sample(c(999, 888), 2, replace=T))

#Create one data frame
all_sub <- rbind(sub1,sub2)

merged <- merge(main, all_sub, by="a", all.x=T)

The output is as expected:

> merged
  a     b   c
1 A  1.37 999
2 B -0.56 200
3 C  0.36 300
4 D  0.63 300
5 E  0.40 999
Chris P
  • 225
  • 2
  • 10