2

I am stuck on probably an elementary problem with pandas DataFrame. In the following code snippet, I insert a calculated column 'CAPACITY_CHECK' then I try to group by the data based on it. But I keep having the following error: TypeError: unhashable type: 'numpy.ndarray'

TEMP['CAPACITY_CHECK'] = TEMP[['ADD_CAPACITY_ST', 'CAPACITY_ST', 'VOLUME_PER_SUPPLIER']].apply(lambda X: numpy.where(X[0]+X[1]<X[2],'Non OK', 'OK'), axis=1)
TEMP.groupby('CAPACITY_CHECK')['ID'].count()

Since I am not trying to modify any immutable object and the new column's type is "Series", I don't understand why I am having the error.

Thanks in advance

Galileo
  • 321
  • 1
  • 4
  • 12

1 Answers1

2

I think you need remove apply and working only with numpy.where:

mask = (TEMP['ADD_CAPACITY_ST'] + TEMP['CAPACITY_ST']) < TEMP['VOLUME_PER_SUPPLIER']
TEMP['CAPACITY_CHECK'] = numpy.where(mask,'Non OK', 'OK')

Sample:

TEMP = pd.DataFrame({'ADD_CAPACITY_ST':[10,20,30],
                     'CAPACITY_ST':[10,20,30],
                     'VOLUME_PER_SUPPLIER':[40,20,100]})

mask = (TEMP['ADD_CAPACITY_ST'] + TEMP['CAPACITY_ST']) < TEMP['VOLUME_PER_SUPPLIER']
TEMP['CAPACITY_CHECK'] = numpy.where(mask,'Non OK', 'OK')
print (TEMP)
   ADD_CAPACITY_ST  CAPACITY_ST  VOLUME_PER_SUPPLIER CAPACITY_CHECK
0               10           10                   40         Non OK
1               20           20                   20             OK
2               30           30                  100         Non OK              

Then use GroupBy.size or GroupBy.count:

TEMP.groupby('CAPACITY_CHECK')['ID'].size()

TEMP.groupby('CAPACITY_CHECK')['ID'].count()

Difference between count and size.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252