1

I am trying to create all possible combinations of Positions and Employees by doing a cross join. But I want to add a condition while joining them. For eg. see below sample dataframes

>>> df_pos
  Country  Pos_id  level
0   India       1      2
1      UK       2      2
2     USA       3      4
3     UAE       4      5
4   India       5      1
5      UK       6      3

>>> df_emp
     Country  Emp_id  level
0         UK      11      3
1         UK      12      4
2        USA      13      4
3  Singapore      14      5
4      India      15      2

I want to find combination of Pos_id and Emp_id where the country is matched first and then the matching job level should be equal to or +1/-1, For eg. we have a position in UK (pos_id 6) for which we have a match of Employees (11 & 12) whose country is also UK and job level is 3 & 4 respectively which is = and +1 job level of required position (i.e job level 3). See below sample output

output_df
0   Pos_id  Emp_id
1   1   15
2   2   11
3   3   13
4   5   15
5   6   11
6   6   12

Is there a simple way to do this in python pandas. Any help would be deeply appreciated.

Sheldore
  • 37,862
  • 7
  • 57
  • 71
Dan
  • 55
  • 1
  • 5

3 Answers3

2

You could filter on your condition after creating the cross-joined table:

output_df = pd.merge(df_pos, df_emp, how='outer', on='Country')
condition = (output_df.level_x - output_df.level_y).between(-1, 1)
output_df = df_merged[condition][['Pos_id', 'Emp_id']]
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
1

I tried this:

    data1 = pd.DataFrame({"Country":["India","UK","USA","UAE","India","UK"], "Pos_id":[1,2,3,4,5,6], "level": [2,2,4,5,1,3]})
    data2 = pd.DataFrame({"Country":["UK", "UK","USA", "Singapore","India"],"Emp_id":[11,12,13,14,15], "level":[3,4,4,5,2]})

    def get_df(country,pos_id,min,max):
        lev = data1.loc[(data1.Country==country) & (data1.Pos_id==pos_id)]["level"]
        data = data2.loc[(data2.Country==country) &(data2["level"].isin(list(range(int(lev)-min,int(lev)+max))))] 
        return data


    print(get_df("UK",6,1,1))
Upasana Mittal
  • 2,480
  • 1
  • 14
  • 19
0

Could you do the join simply on Country and then filter afterwards?

e.g. I would rename the level columns to something like emp_level and pos_level to make it clearer and then:

df = df_pos.merge(df_emp, on='Country')

output_df = df.loc[(df['pos_level'] >= df['emp_level'] - 1) & (df['pos_level'] <= df['emp_level'] + 1)]

Not the prettiest thing but produces what you want.

jack-tee
  • 173
  • 1
  • 8