0

I want to populate the 'Salary' column in DataFrame1 (DF1) with the corresponding 'Salary' in DataFrame2 (DF2). These need to match on 'Team' AND 'Players'.

To note:

The Dataframes are: Not the same size. Not the same order.

import pandas as pd


#df 1:

nba_data = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', '', 'NewYorkKnicks17','Houston Rockets', 'NewYorkKnicks17'], 
            'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Jalen Brunson', 'Kristaps Porzingis', 'JR Smith',
                        'James Harden', 'Derrick Rose',],
            'Salary': ['0', '0', '0','0', '0', '0', '0'],
           'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', 'Phil Jackson', 'Tom Thibideou', '', '']}

nba_df1 = pd.DataFrame(nba_data)

nba_df1


#df2:

nba_data2 = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', 'NewYorkKnicks17', 'NewYorkKnicks17', 'NewYorkKnicks17', 'Houston Rockets'], 
            'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Steph Curry', 'JR Smith', 'Derrick Rose',
                        'Kristaps Porzingis', 'James Harden'],
            'Salary': ['3m', '126m', '0','115m', '0', '20m', '1.5m'],
            'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', '', 'Tom Thibideou', 'Phil Jackson', '']}


nba_df2 = pd.DataFrame(nba_data2)

nba_df2

Result desired = nba_df1 with the appropriate salaries populated (run the below):

nba_data3 = {'Team': ['Mavericks', 'Mavericks', 'Mavericks', '', 'NewYorkKnicks17','Houston Rockets', 'NewYorkKnicks17'], 
            'Players': ['Luka Doncic', 'Kristaps Porzingis', 'Jalen Brunson', 'Kristaps Porzingis', 'JR Smith',
                        'James Harden', 'Derrick Rose',],
            'Salary': ['3m', '126m', '0','20m', '115m', '1.5m', '0'],
           'Coach': ['Rick Carlisle', 'Rick Carlisle', 'Steve Kerr', 'Phil Jackson', 'Tom Thibideou', '', '']}



nba_df1_adjusted = pd.DataFrame(nba_data3)






Kindly note: this is not a tutorial. - it is a specific question and therefore not a duplicate of a general tutorial. 
jcloudz1
  • 59
  • 6
  • 1
    Possible duplicate of [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – baccandr Oct 31 '19 at 15:42
  • This is specific - that is a tutorial of some sort? @baccandr – jcloudz1 Oct 31 '19 at 15:46
  • That is a great QnA post, just go through it and you will be able to easily solve your problem! – baccandr Oct 31 '19 at 15:52
  • Looks like a good post, but it is incorrect to say this is a duplicate. The same could be said of if you linked a google search to a book and said if you just go through this you can solve your problem. I also didn't see any OR condition in that post - so it won't solve my problem seemingly. @baccandr thanks for linking though - may come in handy in the future. – jcloudz1 Oct 31 '19 at 15:53

1 Answers1

2
agg = pd.merge(nba_df1, nba_df2, on = ['Players', 'Team'], how = 'left')

Your result will be on Salary_y

Edit: Kind of dirty but it works:

agg = pd.merge(nba_df1, nba_df2[['Team', 'Players', 'Salary']], on = ['Players', 'Team'], how = 'left')
agg2 = pd.merge(nba_df1, nba_df2, on = ['Players', 'Coach'], how = 'left')

merge = pd.merge(agg, agg2, on = ['Players', 'Coach'])

merge['Salary'] = merge['Salary_y_x'].fillna(merge['Salary_y_y'])
Hamza
  • 167
  • 2
  • 7
  • Hi @Hamza, this leaves the 3rd row Salary_Y blank. Somewhere in here I need it to match on ['Players', 'Team] OR ['Players, 'Coach'] when there is no 'Team' as in row 3. If possible are you able to amend your answer? Many thanks – jcloudz1 Oct 31 '19 at 15:44
  • Why not just join on 'Player' then: ? `agg = pd.merge(nba_df1, nba_df2, on = ['Players''], how = 'left')` – Hamza Oct 31 '19 at 16:10
  • Because the data for a player ie: 'Kristaps Porzingis'. Is not the same for each team he has been a part of. ie: Row 1 (Mavericks | Kristaps Porzingis - salary = 126m). Whereas Row 3 (' ' | Kristaps Porzingis | Phil Jackson - salary = 20m). – jcloudz1 Oct 31 '19 at 16:21
  • Editted my answer hope it helps – Hamza Nov 01 '19 at 22:01