I have a pandas dataframe of dimensions (20000,3000)
and I would there are some duplicated columns but they have different headings. How would I remove those duplicates but keep the original columns in pandas
Asked
Active
Viewed 104 times
-1

jr123456jr987654321
- 294
- 1
- 15
-
3Does this answer your question? [python pandas remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns). There are also solutions that do not depend on the column names, but only on the values (what you want). – sandertjuh May 24 '21 at 11:29
1 Answers
1
You can use to following to remove duplicated columns according to their values:
df=df.T.drop_duplicates().T
like below:
import pandas as pd
df = pd.DataFrame(
{'A': [2, 4, 8, 0],
'B': [2, 0, 0, 0],
'B_duplicated': [2, 0, 0, 0],
'C': [10, 2, 1, 8]})
df = df.T.drop_duplicates().T
This would result in:
A B C
0 2 2 10
1 4 0 2
2 8 0 1
3 0 0 8

Antoine Dubuis
- 4,974
- 1
- 15
- 29