0

I have concat multiple data sources and each data source has it's own unique columns based on the data source. I want to merge rows who match on a list of columns and keep all unique data sources in the same row.

Example:

df1 = pd.DataFrame({'SharedData': ['A', 'B', 'C', 'D', 'E'],
                    'df1Data': ['1', '2', '3', '4', '5']})
df2 = pd.DataFrame({'SharedData': ['D', 'E', 'F', 'G', 'H'],
                    'df2Data': ['4', '5', '6', '7', '8']})
newdf = pd.concat([df1,df2], axis=0, ignore_index=True)

I need the resulting data set to go from before and after below.

Before Data Set:

SharedData df1Data df2Data
A 1
B 2
C 3
D 4
E 5
D 4
E 5
F 6
G 7
H 8

After Data Set:

SharedData df1Data df2Data
A 1
B 2
C 3
D 4 4
E 5 5
F 6
G 7
H 8

I need to deduplicate rows where SharedData matches, with a new row that contains all of the df specific data.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74

0 Answers0