6

If I have a dataframe like below:

|  Column A  |  Column B  |  Column C  |  Column D  |  Column E  |
|:-----------|:---------- |:-----------|:-----------|:-----------|
| 1          | 7          | 1          | 13         | 13         |
| 2          | 8          | 2          | 14         | 13         |
| 3          | 9          | 3          | 15         | 13         |
| 4          | 10         | 4          | 16         | 13         |
| NA         | 11         | NA         | 17         | 13         |
| 6          | 12         | 6          | 1          | 13         |

I'd like to remove the duplicate columns A (or C), ignoring the fact that Column E has duplicate rows, and ignoring the column headers.

T.C. Proctor
  • 6,096
  • 6
  • 27
  • 37
CaesiumWhale
  • 95
  • 1
  • 6
  • This has been marked as a duplicate question. It's not a duplicate of the other question. The other question related to column names. This relates to row values. – tommy.carstensen Sep 18 '22 at 13:01

2 Answers2

13

You can transpose and then transpose back:

df.T.drop_duplicates().T
gold_cy
  • 13,648
  • 3
  • 23
  • 45
10

You can do that with DataFrame.duplicated, use keep in order to keep the first or last duplicated columns:

df.loc[:,~df.T.duplicated(keep='first')]

    Column A  Column B  Column D  Column E
0      1.0        7       13       13
1      2.0        8       14       13
2      3.0        9       15       13
3      4.0       10       16       13
4      NaN       11       17       13
5      6.0       12        1       13
yatu
  • 86,083
  • 12
  • 84
  • 139