Here is my data:
my_df_1 <- data.frame(col_1 = c(1,2,3,4,5,15), col_2 = c(4,5,6,8,9,17))
my_df_2 <- data.frame(col_1 = c(1,6,3,4,4), col_2 = c(4,5,5,11,13), col_3 = c(7,8,9,10,11))
my_df_1
my_df_2
I would like to join my_df_1 and my_df_2 on col_1 and col_2 and get my_df_3
my_df_3 <-data.frame(col_1 = c(1,2,2,3,4,4,5,15), col_2 = c(4,5,5,6,8,8,9,17),
col_3 = c(7,8,9,9,10,11,NA, NA))
my_df_3
Here is a logic of the join. We start with row one of the my_df_1, if I can match values in both columns with my_df_2 then simply pull values from col_3 from my_df_2. For example the first row is matched completely and we simply get value from col_3 = 7.
In the second row of my_df_1 we could only match value in the second column (5) so we got value from column 3 = 8. 5 of second column was also found in and 3 of my_df_2 so we also pulled col_3 = 9 from third row.
In third row of my_df_1 we could only match value in the first column, so we pulled value 9.
Similarly in the 4th row we only matched 4 in two rows of my_df_2 and we pulled 10 and 11.
Other rows were not matched so we ended up with NA. This is a bit similar to left join, but also very different.
What kind of join is it? What is the easiest way to accomplish it?
Update
Thank you everyone for the comments and suggestions. I am struggling with choosing right title for my question. And I also failed to come up with minimal example. It looks like my example came out too abstract. So I am going to make my example more concrete here (but still somewhat minimal). I have database of employees. There are three columns for each employee and there are plenty of nulls.
I also have compensation table with the same column.
For each employee I would like to compute relevant compensation. If can match all columns like in case of employee 5, then answer is clear: 8. When I cannot match all columns like in case of employee 1, I would like to take average over matched values 2 and 8 => 5. That is it. I agree that this does not look like any particular join and it looks like the solution is to take consecutive left joins over power set of columns descending from the biggest to the lowest number of columns and stopping on the match.