Python Pandas merge 2 dataframes

Question

I am trying to merge 2 dataframes that have the same information but broken down differently

df1: #net total at team level

Team    Current Sales    Previous Sales    Team Total Diff
Blue    10               5                 5
Orange  20               8                 12
Yellow  40               11                29

df2: #net total's broken down by region

Team    Region    Curr Sales    Prev Sales    Net Diff
Blue    East      4             4             0
Blue    West      6             1             5
Orange  East      6             3             3
Orange  West      14            5             9
Yellow  East      15            3             12
Yellow  West      25            8             17

Merged Dataframe:

Team    Region    Curr Sales    Previ Sales    Net Diff   Team Total Diff
Blue    East      4             4              0           5
Blue    West      6             1              5           5
Orange  East      6             3              3           12
Orange  West      14            5              9           12 
Yellow  East      15            3              12          29
Yellow  West      25            8              17          29

I am doing this so i can do additional statistical functions in new columns, but i am not sure how to merge the two. If i add df1['Team Total Diff'] to df2, it fills the first 3 records and doesnt fill down for each team name.

if i use the following merge function, i dont see any change:

df2.merge(df1[['team_sort', 'Team']], how='inner', on='Team')

'team_sort' is used as an index to keep the teams sorted based on Net Team Diff in ascending order

Any help would be appreciated

Possible duplicate of [Python Pandas - Understanding inplace=True](https://stackoverflow.com/questions/43893457/python-pandas-understanding-inplace-true) — BallpointBen, Aug 16 '18 at 18:13
It is unclear what the problem is. `pd.DataFrame.merge` produces a *new* DataFrame object. If you're looking to add a column to `df2`, you need to specifically do that. — PMende, Aug 16 '18 at 18:15
@BallpointBen - I don't think [`merge`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) offers an `inplace` kwarg — Battery_Al, Aug 16 '18 at 18:26
@Battery_Al The answer in that thread will answer OP's question. — BallpointBen, Aug 16 '18 at 18:45

score 2 · Accepted Answer · answered Aug 16 '18 at 18:40

You can use map in this situtaion:

df2['Team Total Diff'] = df2['Team'].map(df1.set_index('Team')['Team Total Diff'])
df2

Output:

     Team Region  Curr Sales  Prev Sales  Net Diff  Team Total Diff
0    Blue   East           4           4         0                5
1    Blue   West           6           1         5                5
2  Orange   East           6           3         3               12
3  Orange   West          14           5         9               12
4  Yellow   East          15           3        12               29
5  Yellow   West          25           8        17               29

score 1 · Answer 2 · answered Aug 16 '18 at 18:24

merge is the right method, but you're using it wrong. Try this out:

merged_df = df2.merge(df1[['Team', 'Team Total Diff']], on=['Team'])

This is because merge, like most methods of DataFrame, actually produces a new DataFrame object rather than altering self.

There can be some trickiness with how indexes are treated, so I usually just reset the index before merging dataframes.

score -1 · Answer 3 · answered Aug 16 '18 at 18:14

-1

I think this should do it:

merged_df = pd.merge(df1, df2, how=right, left_on="Team", right_on="Team")

answered Aug 16 '18 at 18:14

Toby Petty

4,431
1
17
29

Will not produce the expected output. e.g column `Previ Sales` – mad_ Aug 16 '18 at 18:26

score -1 · Answer 4 · answered Aug 16 '18 at 18:19

-1

merged_df = pd.concat([df1,df2], join='inner')

default for join is outer, so try inner. If that doesnt work do outer

merged_df = pd.concat([df1,df2], join='outer')

answered Aug 16 '18 at 18:19

RustyShackleford

3,462
9
40
81

Python Pandas merge 2 dataframes

4 Answers4