-1

I have a pandas dataframe of dimensions (20000,3000) and I would there are some duplicated columns but they have different headings. How would I remove those duplicates but keep the original columns in pandas

  • 3
    Does this answer your question? [python pandas remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns). There are also solutions that do not depend on the column names, but only on the values (what you want). – sandertjuh May 24 '21 at 11:29

1 Answers1

1

You can use to following to remove duplicated columns according to their values:

df=df.T.drop_duplicates().T

like below:

import pandas as pd

df = pd.DataFrame(
            {'A': [2, 4, 8, 0],
            'B': [2, 0, 0, 0],
            'B_duplicated': [2, 0, 0, 0],
            'C': [10, 2, 1, 8]})

df = df.T.drop_duplicates().T

This would result in:

A  B   C
0  2  2  10
1  4  0   2
2  8  0   1
3  0  0   8
Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29