Combing two pandas dataframes, adding up column Y dependent on column X duplicates

Question

I have two dataframes (df1/df2), each with two value columns(X/Y). Both dataframes should be combined into a new one (df3). However, I want to add up duplicates in column X. So, if the value of X matches in both dataframes (e.g: "B" in df1 and "B" in df2), I want the value of Y (in df3) to add up (e.g. "2" + "4" = "6"). df3: "B" = "6"

df1 = [["A", "1"], ["B", "2"], ["C", "3"]]
df2 = [["B", "4"], ["C", "5"], ["D", "6"]]

df1 = pd.DataFrame(df1, columns=["X", "Y"])
df2 = pd.DataFrame(df2, columns=["X", "Y"])

df1['Y'] = df1['Y'].astype(int)
df2['Y'] = df2['Y'].astype(int)

df3 = df1.add(df2, fill_value=0)

print(df3)

The result is:

   X   Y
0  AB  5
1  BC  7
2  CD  9

However, what I want to achieve is the following:

      X    Y
 0    A    1
 1    B    6
 2    C    8
 3    D    6

Any suggestions? Thanks in advance!

sophocles · Answer 1 · 2020-12-24T11:10:58.703

1

You are looking for pd.concat().

Make sure you specify axis=0, as this denotes that the concatenation should be done on rows, and not on columns. Note that axis=0 refers to rows, and axis=1 refers to columns.

df3 = pd.concat([df1,df2],axis=0, ignore_index=True)

which prints:

EDIT

Given your recent comment, how about the below which sums up Y when X is duplicated:

df3['Y_new'] = df3.groupby('X')['Y'].transform('sum')
df3.drop_duplicates('X',inplace=True)

which prints:

edited Dec 24 '20 at 11:10

answered Dec 24 '20 at 10:28

sophocles

13,593
3
14
33

1

Add `ignore_index=True` to get a unique index. +1 – cs95 Dec 24 '20 at 10:39
Noted, done, and thank you for the tip! – sophocles Dec 24 '20 at 10:41
1

I tried .concat(), However, I want to add up duplicates in column X. So, if the value of X matches in both dataframes (e.g: "B" in df1 and "B" in df2), I want the value of Y (in df3) to add up (e.g. "2" + "4" = "6"). df3: "B" = "6" – Joeri Dec 24 '20 at 10:53
Check the updated asnwer now. I think it's what you need. – sophocles Dec 24 '20 at 11:04
1

.transform('sum') helped me out. Thank you! – Joeri Dec 24 '20 at 11:10
Please upvote and accept the answer if it satisfies you – sophocles Dec 24 '20 at 11:11

Combing two pandas dataframes, adding up column Y dependent on column X duplicates

1 Answers1