Pandas : change the index of the duplicates

Question

I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].

df0 and df1 have the exact same columns. Most of the rows of df0 are in df1.

The indices of df0 and df1 are

df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])

I then created dft

dft = pd.concat([df0, df1], axis=0, sort=False)

and removed duplicated rows with

dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)

I have some duplicates on the index of dft. For example :

dft.loc[3].shape

returns

(2, 38)

My aim is to change the index of the second row returned to have a unique index 3. This second row should be indexed dft.index.sort_values()[-1]+1.

I would like to apply this operation on all duplicates.

References :

Python Pandas: Get index of rows which column matches certain value

Pandas: Get duplicated indexes

Redefining the Index in a Pandas DataFrame object

score 2 · Accepted Answer · answered Jan 03 '20 at 10:11

2

Add parameter ignore_index=True to concat for avoid duplicated index values:

dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)

answered Jan 03 '20 at 10:11

jezrael

822,522
95
1,334
1,252

score 1 · Answer 2 · answered Jan 03 '20 at 10:37

1

Use reset_index(drop = True)

dft.reset_index(drop=True)

answered Jan 03 '20 at 10:37

Bharath_Raja

622
8
16

Pandas : change the index of the duplicates

2 Answers2