0

I work on a big Python dataframe and notice that some columns have same values for each row BUT columns' names are different. Also, some values are text, or timeseries data.

Any easy was to get rid of these columns duplicates and keep first each time?

Many thanks

1 Answers1

1

Let create a dummy data frame, where two columns with different names are duplicate.

import pandas as pd
df=pd.DataFrame({
    'col1':[1,2,3,'b',5,6],
    'col2':[11,'a',13,14,15,16],
    'col3':[1,2,3,'b',5,6],
     
     })

    col1    col2    col3
0   1       11      1
1   2       a       2
2   3       13      3
3   b       14      b
4   5       15      5
5   6       16      6

To remove duplicate columns, first, take transpose, then apply drop_duplicate and again take transpose

df.T.drop_duplicates().T

result

    col1    col2
0   1       11
1   2       a
2   3       13
3   b       14
4   5       15
5   6       16
Talha Anwar
  • 2,699
  • 4
  • 23
  • 62