I have an example df
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['Josh', 'Paul','Ivy','Mark'],
'orderId':[1,2,3,4],
'purchases':[['sofa','sofa','chair'],
['chair','sofa'],
['sofa','chair'],
['sofa','chair','chair']]})
4 people bought the same items - sofa & chair
but in different quantities, but in general they all bought a sofa
and a chair
- only 1 combination of the distinct product, think of it as a set(purchases)
.
I want to answer how many times each combination of purchases has been bought - we know it's 4
since 4
people bought the same set of items.
So I thought of this as a pseudo code where I group by the set of each purchase
value:
df = df.groupby(set('purchases')).agg({'orderId':pd.Series.nunique})
But I get an error which is expected:
TypeError: 'set' object is not callable
I am wondering what is the best way to achieve the grouping by the set
of a value rather then the actual value, which in this case is a list.
When I tried simply grouping by the purchases
df = df.groupby('purchases').agg({'orderId':pd.Series.nunique})
I get :
TypeError: unhashable type: 'list'\
I tried changing the lists to tuples:
df = pd.DataFrame({'name':['Josh', 'Paul','Ivy','Mark'],
'orderId':[1,2,3,4],
'purchases':[('sofa','sofa','chair'),
('chair','sofa'),
('sofa','chair'),
('sofa','chair','chair')]})
And then
df = df.groupby('purchases').agg({'purchases':lambda x:{y for y in x}}) # or set(x)
But this gives
purchases
purchases
(chair, sofa) {(chair, sofa)}
(sofa, chair) {(sofa, chair)}
(sofa, chair, chair) {(sofa, chair, chair)}
(sofa, sofa, chair) {(sofa, sofa, chair)}
There is still a tuple inside the set as it looks for the same tuples and not looking inside the tuple?
And I tried:
df['purchases_unique'] = df['purchases'].apply(lambda x: set(x))
df['# of times bought'] = df.apply(lambda x: x.value_counts())
But I get :
TypeError: unhashable type: 'set' While the Jupyter notebook still provides an answer with log message:
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
File "pandas\_libs\hashtable_class_helper.pxi", line 1709, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'
{sofa, chair} 4
So in summary I am both looking for an answer how could I assign the value 4 to each of the rows so that the result would look like this:
name orderId purchases # of times bought
Josh 1 (sofa, sofa, chair) 4
Paul 2 (chair, sofa) 4
Ivy 3 (sofa, chair) 4
Mark 4 (sofa, chair, chair) 4
And why isn't pandas allowing to group by a set
if python is capable of evaluating {'chair', 'sofa'} == {'sofa', 'chair'}
?