1

I wish to convert a large dataframe row of repeated strings to a non repeated list via dict.fromkeys(). I have a simple df here for demonstration.

Input:

df = pd.DataFrame({'A':['X'],'B':['X'],'C':['X'],'D':['Y'],'E':['Y'],'F':['Y']})
df_list = df.values.tolist()
l= list(dict.fromkeys(df_list))

Output: df,df_list,error

     A  B  C  D  E  F
  0  X  X  X  Y  Y  Y

[['X', 'X', 'X', 'Y', 'Y', 'Y']]

           l= list(dict.fromkeys(df_list))
TypeError: unhashable type: 'list'

Desired Output:

list of x,y

I recognise that the problem is due to a list within a list....perhaps there is a direct way of extracting non repeated elements from the dataframe row?

shoggananna
  • 545
  • 5
  • 9

2 Answers2

1
df.iloc[0,:].unique().tolist()

Edit: A more Python driven methodology could be used:

list(set(df.iloc[0,:]))
1

If you insist on using dict.fromkeys(), can do:

list(dict.fromkeys(df_list[0]).keys())

But I would suggest other methods, such as:

df.apply(lambda row: pd.Series(row).drop_duplicates(keep='first'),axis='columns').loc[0,:].to_list()

or:

 df.stack().reset_index().drop(columns='level_1')[0].drop_duplicates().to_list()

or:

pd.DataFrame(df.apply(pd.Series.unique, axis=0)).loc[0,:].drop_duplicates().to_list()

or:

pd.DataFrame(list(map(pd.unique, df.values))).loc[0,:].to_list()

These are based on: Removing duplicates from Pandas rows.

zabop
  • 6,750
  • 3
  • 39
  • 84