-1

Here is my data:

my_df_1 <- data.frame(col_1 = c(1,2,3,4,5,15), col_2 = c(4,5,6,8,9,17))
my_df_2 <- data.frame(col_1 = c(1,6,3,4,4), col_2 = c(4,5,5,11,13), col_3 = c(7,8,9,10,11))

my_df_1

my_df_1

my_df_2

my_df_2

I would like to join my_df_1 and my_df_2 on col_1 and col_2 and get my_df_3

my_df_3 <-data.frame(col_1 = c(1,2,2,3,4,4,5,15), col_2 = c(4,5,5,6,8,8,9,17),
                     col_3 = c(7,8,9,9,10,11,NA, NA))

my_df_3

my_df_3

Here is a logic of the join. We start with row one of the my_df_1, if I can match values in both columns with my_df_2 then simply pull values from col_3 from my_df_2. For example the first row is matched completely and we simply get value from col_3 = 7.

In the second row of my_df_1 we could only match value in the second column (5) so we got value from column 3 = 8. 5 of second column was also found in and 3 of my_df_2 so we also pulled col_3 = 9 from third row.

In third row of my_df_1 we could only match value in the first column, so we pulled value 9.

Similarly in the 4th row we only matched 4 in two rows of my_df_2 and we pulled 10 and 11.

Other rows were not matched so we ended up with NA. This is a bit similar to left join, but also very different.

What kind of join is it? What is the easiest way to accomplish it?

Update

Thank you everyone for the comments and suggestions. I am struggling with choosing right title for my question. And I also failed to come up with minimal example. It looks like my example came out too abstract. So I am going to make my example more concrete here (but still somewhat minimal). I have database of employees. There are three columns for each employee and there are plenty of nulls.

employee table

I also have compensation table with the same column. compensation table

For each employee I would like to compute relevant compensation. If can match all columns like in case of employee 5, then answer is clear: 8. When I cannot match all columns like in case of employee 1, I would like to take average over matched values 2 and 8 => 5. That is it. I agree that this does not look like any particular join and it looks like the solution is to take consecutive left joins over power set of columns descending from the biggest to the lowest number of columns and stopping on the match.

user1700890
  • 7,144
  • 18
  • 87
  • 183
  • 1
    This really isn't a type of join, it's sequential data processing (with joins and some conditional logic). You have multiple potential joins in there, and frankly it seems rather specific to whatever you are meaning to do with it ... there doesn't seem to be a way of looking at this that is general enough to be called a type of join. – r2evans Aug 13 '21 at 22:26
  • @akrun, I think it is replicated in my_df_3 there is 3 and 6 and corresponding value 9 – user1700890 Aug 13 '21 at 22:37
  • Please ask 1 specific researched non-duplicate question. [ask] [Help] What is a "kind of join"? Why ask? Where are you stuck coding? When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values. [mre] PS Please clarify via edits, not comments. PS Please [use text, not images/links, for text--including tables](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. – philipxy Aug 14 '21 at 01:56
  • We don't conditionally join. We join multiple tables, then we keep just a row for every output row, then for each row we select output column values that are a function of its column values. That is a faq. Please before considering posting read the manual & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect your research. See [ask], [Help] & the voting arrow mouseover texts. If you post a question, use one phrasing as title. – philipxy Aug 14 '21 at 02:01
  • 2
    I interpreted your question as [Merge dataframes by a match in at least one of two columns](https://stackoverflow.com/questions/16293518/merge-dataframes-by-a-match-in-at-least-one-of-two-columns). – Henrik Aug 14 '21 at 12:18
  • If that is the case, you may want to rephrase the title and question to make it more searchable and increase its value for future visitors. Cheers. – Henrik Aug 15 '21 at 09:23
  • @Henrik, do you mean my comment about multiple columns? Philpxy's comment made me thinking what proper title should be? – user1700890 Aug 15 '21 at 12:45
  • I was just thinking that your title and description may be more explicit and relate to the logic behind the problem you wish to solve, like e.g. the link I posted. But I may have misunderstood your objective. Cheers – Henrik Aug 15 '21 at 13:39

1 Answers1

1

I don't think there is any name for this type of operation but you can achieve the desired output using series of joins and combining them.

library(dplyr)

df1 <- my_df_1 %>% 
  inner_join(my_df_2, by = c('col_1', 'col_2'))

df2 <- my_df_1 %>% 
          inner_join(my_df_2, by = 'col_1') %>%
          rename(col_2 = col_2.x) %>%
          select(-col_2.y)

df3 <- my_df_1 %>% 
         inner_join(my_df_2, by = 'col_2') %>%
         rename(col_1 = col_1.x) %>%
         select(-col_1.y)

bind_rows(df1, df2, df3) %>%
  distinct() %>%
  right_join(my_df_1, by = c('col_1', 'col_2'))

#  col_1 col_2 col_3
#1     1     4     7
#2     3     6     9
#3     4     8    10
#4     4     8    11
#5     2     5     8
#6     2     5     9
#7     5     9    NA
#8    15    17    NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you, this is good answer! The only problem is that I have 7 columns to do this type of join. I illustrated with only 2 columns. I guess I have to consider power set of my columns. – user1700890 Aug 14 '21 at 20:59