1

I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].

df0 and df1 have the exact same columns. Most of the rows of df0 are in df1.

The indices of df0 and df1 are

df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])

I then created dft

dft = pd.concat([df0, df1], axis=0, sort=False)

and removed duplicated rows with

dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)

I have some duplicates on the index of dft. For example :

dft.loc[3].shape

returns

(2, 38)

My aim is to change the index of the second row returned to have a unique index 3. This second row should be indexed dft.index.sort_values()[-1]+1.

I would like to apply this operation on all duplicates.

References :

Python Pandas: Get index of rows which column matches certain value

Pandas: Get duplicated indexes

Redefining the Index in a Pandas DataFrame object

Basile
  • 575
  • 1
  • 6
  • 13

2 Answers2

2

Add parameter ignore_index=True to concat for avoid duplicated index values:

dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Use reset_index(drop = True)

dft.reset_index(drop=True)
Bharath_Raja
  • 622
  • 8
  • 16