0

I am trying to merge 2 dataframes that have the same information but broken down differently

df1: #net total at team level

Team    Current Sales    Previous Sales    Team Total Diff
Blue    10               5                 5
Orange  20               8                 12
Yellow  40               11                29

df2: #net total's broken down by region

Team    Region    Curr Sales    Prev Sales    Net Diff
Blue    East      4             4             0
Blue    West      6             1             5
Orange  East      6             3             3
Orange  West      14            5             9
Yellow  East      15            3             12
Yellow  West      25            8             17

Merged Dataframe:

Team    Region    Curr Sales    Previ Sales    Net Diff   Team Total Diff
Blue    East      4             4              0           5
Blue    West      6             1              5           5
Orange  East      6             3              3           12
Orange  West      14            5              9           12 
Yellow  East      15            3              12          29
Yellow  West      25            8              17          29

I am doing this so i can do additional statistical functions in new columns, but i am not sure how to merge the two. If i add df1['Team Total Diff'] to df2, it fills the first 3 records and doesnt fill down for each team name.

if i use the following merge function, i dont see any change:

df2.merge(df1[['team_sort', 'Team']], how='inner', on='Team')

'team_sort' is used as an index to keep the teams sorted based on Net Team Diff in ascending order

Any help would be appreciated

John
  • 479
  • 8
  • 21
  • 1
    Possible duplicate of [Python Pandas - Understanding inplace=True](https://stackoverflow.com/questions/43893457/python-pandas-understanding-inplace-true) – BallpointBen Aug 16 '18 at 18:13
  • It is unclear what the problem is. `pd.DataFrame.merge` produces a *new* DataFrame object. If you're looking to add a column to `df2`, you need to specifically do that. – PMende Aug 16 '18 at 18:15
  • @BallpointBen - I don't think [`merge`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) offers an `inplace` kwarg – Battery_Al Aug 16 '18 at 18:26
  • @Battery_Al The answer in that thread will answer OP's question. – BallpointBen Aug 16 '18 at 18:45

4 Answers4

2

You can use map in this situtaion:

df2['Team Total Diff'] = df2['Team'].map(df1.set_index('Team')['Team Total Diff'])
df2

Output:

     Team Region  Curr Sales  Prev Sales  Net Diff  Team Total Diff
0    Blue   East           4           4         0                5
1    Blue   West           6           1         5                5
2  Orange   East           6           3         3               12
3  Orange   West          14           5         9               12
4  Yellow   East          15           3        12               29
5  Yellow   West          25           8        17               29
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

merge is the right method, but you're using it wrong. Try this out:

merged_df = df2.merge(df1[['Team', 'Team Total Diff']], on=['Team'])

This is because merge, like most methods of DataFrame, actually produces a new DataFrame object rather than altering self.

There can be some trickiness with how indexes are treated, so I usually just reset the index before merging dataframes.

Battery_Al
  • 779
  • 1
  • 5
  • 20
-1

I think this should do it:

merged_df = pd.merge(df1, df2, how=right, left_on="Team", right_on="Team")
Toby Petty
  • 4,431
  • 1
  • 17
  • 29
-1
merged_df = pd.concat([df1,df2], join='inner')

default for join is outer, so try inner. If that doesnt work do outer

merged_df = pd.concat([df1,df2], join='outer')
RustyShackleford
  • 3,462
  • 9
  • 40
  • 81