Pandas Remove Duplicates from row

Question

I have a CSV file which has multiple duplicate values in the row. I Would like to remove these duplicate values so I am only left with the unique values.

Dataframe:

 1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      Bypass User Account Control    T3431
Local Account                T3523      Domain Account       T4252      Local Account                  T3523

Expected Dataframe:

  1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      
Local Account                T3523      Domain Account       T4252

There are 100's of duplicate data in the rows and i would only like to see the unique values

jezrael · Accepted Answer · 2021-02-03T11:15:35.773

1

Convert each row to unique values with unique, output is array, so convert to Series:

df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

Or:

df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

Last for original columns names use:

df1.columns = df.columns[:len(df1.columns)]

edited Feb 03 '21 at 11:15

answered Feb 03 '21 at 10:52

jezrael

822,522
95
1,334
1,252

1

Amazing thank you very much. is there a way i can keep my original headers in the file? – Will Feb 03 '21 at 11:14
@Will - There should be different each Series, so you can add `df1.columns = df.columns[:len(df1.columns)]` if `df1` is output `DataFrame` – jezrael Feb 03 '21 at 11:16

score 1 · Answer 2 · edited Feb 03 '21 at 11:01

1

Use

(df.stack()
  .groupby(level=0).apply(lambda x: x.drop_duplicates())
  .unstack()
  .reset_index(drop=True))

result:

                             1      2                   3      4
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

edited Feb 03 '21 at 11:01

Ferris

5,325
1
14
23

answered Feb 03 '21 at 10:53

wwnde

26,119
6
18
32

Pandas Remove Duplicates from row

2 Answers2