convert dataframe row of repeated strings to a non repeated list

Question

I wish to convert a large dataframe row of repeated strings to a non repeated list via dict.fromkeys(). I have a simple df here for demonstration.

Input:

df = pd.DataFrame({'A':['X'],'B':['X'],'C':['X'],'D':['Y'],'E':['Y'],'F':['Y']})
df_list = df.values.tolist()
l= list(dict.fromkeys(df_list))

Output: df,df_list,error

     A  B  C  D  E  F
  0  X  X  X  Y  Y  Y

[['X', 'X', 'X', 'Y', 'Y', 'Y']]

           l= list(dict.fromkeys(df_list))
TypeError: unhashable type: 'list'

Desired Output:

list of x,y

I recognise that the problem is due to a list within a list....perhaps there is a direct way of extracting non repeated elements from the dataframe row?

What do you want as final output? a list like `lst = ['X','Y']` ? — balderman, Oct 18 '20 at 11:22

Nuno B. Brandao · Accepted Answer · 2020-10-18T11:42:11.920

1

df.iloc[0,:].unique().tolist()

Edit: A more Python driven methodology could be used:

list(set(df.iloc[0,:]))

edited Oct 18 '20 at 11:42

answered Oct 18 '20 at 11:25

Nuno B. Brandao

76
6

zabop · Answer 2 · 2020-10-18T11:36:47.080

1

If you insist on using dict.fromkeys(), can do:

list(dict.fromkeys(df_list[0]).keys())

But I would suggest other methods, such as:

df.apply(lambda row: pd.Series(row).drop_duplicates(keep='first'),axis='columns').loc[0,:].to_list()

or:

 df.stack().reset_index().drop(columns='level_1')[0].drop_duplicates().to_list()

or:

pd.DataFrame(df.apply(pd.Series.unique, axis=0)).loc[0,:].drop_duplicates().to_list()

or:

pd.DataFrame(list(map(pd.unique, df.values))).loc[0,:].to_list()

These are based on: Removing duplicates from Pandas rows.

edited Oct 18 '20 at 11:36

answered Oct 18 '20 at 11:29

zabop

6,750
3
39
84

I dont insist on it but good to know your proposal also works. thanks. – shoggananna Oct 18 '20 at 11:31
@user9106985, alright; added some other methods in case you need them. – zabop Oct 18 '20 at 11:38
Nice alternatives! Thanks – shoggananna Oct 18 '20 at 11:43

convert dataframe row of repeated strings to a non repeated list

2 Answers2