-2

I have two dataframes (df1/df2), each with two value columns(X/Y). Both dataframes should be combined into a new one (df3). However, I want to add up duplicates in column X. So, if the value of X matches in both dataframes (e.g: "B" in df1 and "B" in df2), I want the value of Y (in df3) to add up (e.g. "2" + "4" = "6"). df3: "B" = "6"

df1 = [["A", "1"], ["B", "2"], ["C", "3"]]
df2 = [["B", "4"], ["C", "5"], ["D", "6"]]

df1 = pd.DataFrame(df1, columns=["X", "Y"])
df2 = pd.DataFrame(df2, columns=["X", "Y"])

df1['Y'] = df1['Y'].astype(int)
df2['Y'] = df2['Y'].astype(int)

df3 = df1.add(df2, fill_value=0)

print(df3)

The result is:

   X   Y
0  AB  5
1  BC  7
2  CD  9

However, what I want to achieve is the following:

      X    Y
 0    A    1
 1    B    6
 2    C    8
 3    D    6

Any suggestions? Thanks in advance!

Joeri
  • 1
  • 2

1 Answers1

1

You are looking for pd.concat().

Make sure you specify axis=0, as this denotes that the concatenation should be done on rows, and not on columns. Note that axis=0 refers to rows, and axis=1 refers to columns.

df3 = pd.concat([df1,df2],axis=0, ignore_index=True)

which prints:

   X  Y
0  D  1
1  B  2
2  C  3
3  D  4
4  E  5
5  F  6

EDIT

Given your recent comment, how about the below which sums up Y when X is duplicated:

df3['Y_new'] = df3.groupby('X')['Y'].transform('sum')
df3.drop_duplicates('X',inplace=True)

which prints:

   X  Y 
0  D  1      
1  B  2      
2  C  3      
4  E  5      
5  F  6      
sophocles
  • 13,593
  • 3
  • 14
  • 33