Matching Two Pandas DataFrames based on values in columns

Question

I'm trying to match job candidates to mentors based on different several variables that would hopefully create a good match. There are two Pandas DataFrames (one for candidates and one for mentors) that I'm trying to connect based on experience, location, desired job, etc.

For example I have a mentor DataFrame that might look something like the below:

mentor_df = pd.DataFrame({
      "Mentor":["Bob", "Kate", "Joe", "Mark"],
      "Experience":[3, 4, 5, 4],
      "Location": ["US", "FR", "JP", "US"],
      "Industry": ["Tech", "Tech", "Media", "Medicine"]
    })
    Mentor  Experience Location Industry
0  Bob     3         US       Tech
1  Kate    4         FR       Tech
2  Joe     5         JP       Media
2  Mark    4         US       Medicine

Along with a corresponding candidate DataFrame that looks like the below:

candidate_df = pd.DataFrame({
      "Candidate":["Candidate 1", "Candidate 2", "Candidate 3"],
      "Experience":[4, 4, 5],
      "Location": ["US", "FR", "JP", "US"],
      "Industry": ["Tech", "Media", "Medicine"]
    })
   Candidate      Experience Location Industry
0  Candidate 1    4         US        Tech
1  Canidate  2    4         FR        Media
2  Canidate  3    5         JP        Medicine

The resulting DataFrame connecting the candidates to mentors might look something like the below.

    Mentor  Experience Location Candidate    Industry
0  Bob     3         US       NaN          NaN
1  Kate    4         FR       Candidate 2  Tech
2  Joe     5         JP       Candidate 3  Media
2  Mark    4         US       Candidate 1  Medicine

Note that the DataFrame for the mentors will be larger than the candidates as every candidate should receive.

Any thoughts on how to best approaches this problem will be greatly appreciated :)

P.S. I know this might read like a homework problem, but the application that this is intended for is to help match separating service members and veterans find jobs by pairing them with mentors in career fields they're interested in.

Both dfs are currently named `mentor_df` but assuming they weren't, also assuming the second df could be constructed and that the fourth entry in location was removed, and the industries aligned correctly: something like `mentor_df.merge(candidate_df, how='left')`. Lot's of good information here: [Pandas Merging 101](https://stackoverflow.com/q/53645882/15497888) — Henry Ecker, Jun 03 '21 at 23:07
Oops, thanks for the catch. Updated to candidate_df and will take a look at merge now... — sokeefe, Jun 03 '21 at 23:13

score 1 · Accepted Answer · answered Jun 04 '21 at 00:08

@Henry is on the right path. You'll need to modify your candidate dataframe to a) make sure all arrays are the same length (or add NaNs if you don't have them, and b) tweak a bit to make sure you actually have some matches.

I used your mentor_df, and the following candidate_df:

    candidate_df = pd.DataFrame({
          "Candidate":["Candidate 1", "Candidate 2", "Candidate 3", "Candidate 4"],
          "Experience":[4, 4, 5, 4],
          "Location": ["US", "FR", "JP", "US"],
          "Industry": ["Tech", "Media", "Medicine", "Medicine"]
        })

Then the merge works fine:

merged = mentor_df.merge(candidate_df, how='left')

Output:

 Mentor  Experience Location  Industry    Candidate
0    Bob           3       US      Tech          NaN
1   Kate           4       FR      Tech          NaN
2    Joe           5       JP     Media          NaN
3   Mark           4       US  Medicine  Candidate 4

Note you need to get to the last row before you have both a candidate and a mentor since this is matching on experience, location, and industry, and unless all three match you get a NaN either in candidate or mentor.

Good luck!

Matching Two Pandas DataFrames based on values in columns

1 Answers1