How to remove duplicate columns in Pandas?

Question

How can I delete all the REGION_y columns from dataframe and just keep one?

Index(['COUNTRY', 'DYSTOPIA RESIDUAL', 'ECONOMY GDP PER CAPITA', 'FAMILY',
       'FREEDOM', 'GENEROSITY', 'HAPPINESS RANK', 'HAPPINESS SCORE',
       'HEALTH LIFE EXPECTANCY', 'LOWER CONFIDENCE INTERVAL', 'STANDARD ERROR',
       'TRUST GOVERNMENT CORRUPTION', 'UPPER CONFIDENCE INTERVAL',
       'WHISKER HIGH', 'WHISKER LOW', 'YEAR', 'REGION_y', 'REGION_y',
       'REGION_y', 'REGION_y', 'REGION_y', 'REGION_x', 'REGION_y', 'REGION_x',
       'REGION_y', 'REGION_x', 'REGION_y', 'REGION_x', 'REGION_y'],
      dtype='object')

Possible duplicate of [Drop all duplicate rows in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas) — Erfan, Oct 13 '19 at 22:00
Please provide more details, as per your code to have only one "REGION_y" column it seems just needed to add it one time to the array: Index(['COUNTRY', , 'YEAR', 'REGION_y', 'REGION_y', 'REGION_x', dtype='object']. Why it is repeated ? — Alex 75, Oct 14 '19 at 00:10

Massifox · Accepted Answer · 2019-10-13T22:10:22.327

0

Remove all duplicated columns, with this code:

df = df.loc[:, ~df.columns.duplicated()]

and the remaining columns will be:

Index(['COUNTRY', 'DYSTOPIA RESIDUAL', 'ECONOMY GDP PER CAPITA', 'FAMILY',
       'FREEDOM', 'GENEROSITY', 'HAPPINESS RANK', 'HAPPINESS SCORE',
       'HEALTH LIFE EXPECTANCY', 'LOWER CONFIDENCE INTERVAL', 'STANDARD ERROR',
       'TRUST GOVERNMENT CORRUPTION', 'UPPER CONFIDENCE INTERVAL',
       'WHISKER HIGH', 'WHISKER LOW', 'YEAR', 'REGION_y', 'REGION_x'],
      dtype='object')

edited Oct 13 '19 at 22:10

answered Oct 13 '19 at 21:59

Massifox

4,369
11
31

You still have a `'REGION_y'` column. Use `duplicated(keep=False)` to remove **all** the columns. – Valentino Oct 13 '19 at 22:31

How to remove duplicate columns in Pandas?

1 Answers1