1

I have 3 data frames, which look like this:

Data Frame 1

ColA    ColB    ColC
1       B1      1-C1
2       B2      1-C2
3       B3      1-C3
4       B4  
5       B5  
6       B6  
7       B7  
8       B8  
9       B9      1-C9
10      B10 
11      B11 
12      B12     1-C12
13      B13 
14      B14     1-C14

Data Frame 2

ColA    ColB    ColC
1       B1  
2       B2  
3       B3      2-C3
4       B4      2-C4
5       B5  
6       B6      2-C6
7       B7  
8       B8      2-C8
9       B9      2-C9
10      B10 
11      B11 
12      B12     2-C12
13      B13 
14      B14     2-C14
15      B15     2-C15

Data Frame 3

ColA    ColB    3-Col3-C
1       B1  
2       B2  
3       B3       3-C3
4       B4  
5       B5  
6       B6       3-C6
7       B7  
8       B8  
9       B9       3-C9
10      B10 
11      B11      3-C11
12      B12      3-C12
13      B13      3-C13
14      B14 
15      B15      3-C15
16      B16      3-C16
17      B17      3-C17

I want to merge these data frames. The data in the first data frame takes first priority. The second data frames data takes second priority, and the third set's data takes last priority. How would I do so? My desired output is shown below.

Desired data frame:

ColA    ColB    ColC
1       B1      1-C1
2       B2      1-C2
3       B3      1-C3
4       B4      2-C4
5       B5  
6       B6      2-C6
7       B7  
8       B8      2-C8
9       B9      1-C9
10      B10 
11      B11     3-C11
12      B12     1-C12
13      B13     3-C13
14      B14     1-C14
15      B15     3-C15
16      B16     3-C16
17      B17     3-C17
J.Doe
  • 139
  • 10
  • 2
    What are the blank values in the columns? Are they NA values? Are they zero length strings? Space? It's easier to help if you share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) without any ambiguity. – MrFlick Jul 20 '18 at 15:11
  • @MrFlick They're basically NA values. – J.Doe Jul 20 '18 at 15:18
  • If the data from first takes priority, then 2nd, and last 3rd, some values in the expected output is not showing it. For example 'B13' have NA for first and 2nd dataset, while for third it is not empty. Shouldn't it be the value of third data? – akrun Jul 20 '18 at 17:03
  • Sorry, that was a mistake on my part, I will change that. Thank you. – J.Doe Jul 20 '18 at 17:12
  • @J.Doe If that is the case, please check the posted solution. It should work – akrun Jul 20 '18 at 17:13
  • @akrun Thanks for the help. I had just one more question. How would you change the code you provided if you wanted to use 8 data frames instead of 3? I'm guessing you would have to add the data frames to the list, but would you have to change anything else? – J.Doe Jul 23 '18 at 16:11

1 Answers1

2

We could keep the datasets in a list, then do a full_join by 'ColA', 'ColB' and coalesce the 'ColC' columns to return the output

library(tidyverse)
list(df1, df2, df3) %>% 
    reduce(full_join, by = c('ColA', 'ColB')) %>% 
    transmute(ColA, ColB, ColC = coalesce(ColC.x, ColC.y, ColC))

NOTE: Based on the OP's comments, the blanks in the data are NA elements

akrun
  • 874,273
  • 37
  • 540
  • 662
  • how would you change this if you wanted to use 8 data frames instead of 3? I'm guessing you would have to add the data frames to the list, but would you have to change anything else? – J.Doe Jul 23 '18 at 14:57
  • 1
    @J.Doe If the object names are 'df1', 'df2', ... , 'df3', then you could get this in a list with `mget(ls(pattern="^df\\d+")) %>% reduce(` – akrun Jul 23 '18 at 16:55
  • Hey, just another followup question. Do you know how you would do the original question if you want it to keep the same amount of rows in data frame 1? So the output would be basically the same thing, but rows 15, 16, and 17 are not included. – J.Doe Jul 27 '18 at 18:39
  • @J.Doe In the `reduce` step, you may need `left_join` i.e. `reduce(left_join,` – akrun Jul 27 '18 at 21:12