Removing columns in Pandas

Question

I work on a big Python dataframe and notice that some columns have same values for each row BUT columns' names are different. Also, some values are text, or timeseries data.

Any easy was to get rid of these columns duplicates and keep first each time?

Many thanks

are the values are partially duplicated or completely duplicated? — Talha Anwar, Jul 13 '20 at 13:42
completely as far as i can see (300 000 rows), including the format — Pierre Kovatcheva, Jul 13 '20 at 13:44
Welcome to SO. Please read https://stackoverflow.com/help/mcve and post your attempted code. — Bussller, Jul 13 '20 at 13:45
Does this answer your question? [python pandas remove duplicate columns](https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns) — Niko Föhr, Jul 13 '20 at 14:03

score 1 · Answer 1 · answered Jul 13 '20 at 13:49

1

Let create a dummy data frame, where two columns with different names are duplicate.

import pandas as pd
df=pd.DataFrame({
    'col1':[1,2,3,'b',5,6],
    'col2':[11,'a',13,14,15,16],
    'col3':[1,2,3,'b',5,6],
     
     })

    col1    col2    col3
0   1       11      1
1   2       a       2
2   3       13      3
3   b       14      b
4   5       15      5
5   6       16      6

To remove duplicate columns, first, take transpose, then apply drop_duplicate and again take transpose

df.T.drop_duplicates().T

result

    col1    col2
0   1       11
1   2       a
2   3       13
3   b       14
4   5       15
5   6       16

answered Jul 13 '20 at 13:49

Talha Anwar

2,699
4
23
62

thanks Talha, no need to place an inplace=true somehwere to definitely modifiy the original df? – Pierre Kovatcheva Jul 13 '20 at 14:34
yes, you can place it, if you want to replace the original df instead of creating a new one – Talha Anwar Jul 13 '20 at 14:41
where exactly can i place it? – Pierre Kovatcheva Jul 13 '20 at 17:33
`df.T.drop_duplicates(inplace=True).T` – Talha Anwar Jul 14 '20 at 05:07

Removing columns in Pandas

1 Answers1