1

I am trying to get the unique values from a list into a different column using 'set' function in the python 3. However I am getting the error: "TypeError: 'Series' objects are mutable, thus they cannot be hashed". What am I doing wrong here?

Sample Data:

id,food 1,food 2,food 3
1,,apples,mango
2,oranges,grapes,oranges
3,bananas,,apples

Code:

df = pd.read_csv('food.csv')
df

# pass
list(set(['apples','apples','oranges']))
# answers: ['apples', 'oranges'] #working

# fails if I pass in a dataframe columns. Why?
df['food_all'] = list(set([df['food 1'],df['food 2'],df['food 3']]))
df['food_all']

output like (ignoring spaces/null values...etc):

id,food_all
1,['apples','mango']
2,['oranges','grapes']
3,['bananas','apples']
sharp
  • 2,140
  • 9
  • 43
  • 80

3 Answers3

1

You can get a set of row values with row-wise apply

df.apply(lambda x: list(set(x.dropna())), axis=1)

which outputs

0      [mango, apples]
1    [grapes, oranges]
2    [bananas, apples]
dtype: object
taras
  • 6,566
  • 10
  • 39
  • 50
1

This should work:

df = pd.read_csv('food.csv')

df['food_all'] = df[['food1','food2','food3']].apply(lambda x: ', '.join(sorted(set(x.dropna().astype(str)), reverse=False)), axis=1).values.tolist()

print(df)

result:

    food1   food2    food3         food_all
0   apples  apples    mango    mango, apples
1  oranges  grapes  oranges  grapes, oranges
2  bananas  apples     None  bananas, apples
gripep
  • 379
  • 3
  • 13
  • Thanks for the reply. It is working however, it is taking all the fields in the dataframe. I am just trying to merge just these three columns (food1, food2, food3) nothing else. I"ll update the data above – sharp Jul 18 '18 at 15:30
0

You need to use pd.concat (or some other method) to create a non-unique list of each element in each DataFrame column. Then you can pass the non-unique list to the set function.

set(pd.concat([df['food 1'],df['food 2'],df['food 3']]))

EDIT

Sorry, I misunderstood your desired output the first time I read the question. This will get you the desired output:

def get_set(row):
    return set([row['food 1'], row['food 2'], row['food 3']])

df['food_all'] = df.apply(get_set, axis=1)

This is because, as the error states, you can only pass hashable objects to a set. As explained here, quoting this source, a set uses the hash value of an object internally, therefore any item passed to a set must be hashable. Since the items in the list you are using to construct the set are Series objects, which aren't hashable, you cannot use that list to make the set.

gaw89
  • 1,018
  • 9
  • 19