1

I am trying to calculate the Median of Groups over columns. I found a very clear example at

Pandas: Calculate Median of Group over Columns

This question and answer is the exactly the answer I needed. I created the exact example posted to work through the details on my own

import pandas
import numpy

data_3 = [2,3,4,5,4,2]
data_4 = [0,1,2,3,4,2]

df = pandas.DataFrame({'COL1': ['A','A','A','A','B','B'], 
                       'COL2': ['AA','AA','BB','BB','BB','BB'],
                       'COL3': data_3,
                       'COL4': data_4})

m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

When I tried to calculate the median of Group over columns I encounter the error

TypeError: Series.name must be a hashable type

If I do the exact same code with the only difference replacing median with a different statistic (mean, min, max, std) and everything works just fine.

I don't understand the cause of this error and why it only occurs for median, which is what I really need to calculate.

Thanks in advance for your help,

Bob

Here is the full error message. I am using python 3.5.2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-af0ef7da3347> in <module>()
----> 1 m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(numpy.median)

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    649         # ignore SettingWithCopy here in case the user mutates
    650         with option_context('mode.chained_assignment', None):
--> 651             return self._python_apply_general(f)
    652 
    653     def _python_apply_general(self, f):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _python_apply_general(self, f)
    658             keys,
    659             values,
--> 660             not_indexed_same=mutated or self.mutated)
    661 
    662     def _iterate_slices(self):

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3373                 coerce = True if any([isinstance(x, Timestamp)
   3374                                       for x in values]) else False
-> 3375                 return (Series(values, index=key_index, name=self.name)
   3376                         ._convert(datetime=True,
   3377                                   coerce=coerce))

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
        231         generic.NDFrame.__init__(self, data, fastpath=True)
        232 
    --> 233         self.name = name
        234         self._set_axis(0, index, fastpath=True)
        235 

    /Applications/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py in __setattr__(self, name, value)

   2692             object.__setattr__(self, name, value)
   2693         elif name in self._metadata:
-> 2694             object.__setattr__(self, name, value)
   2695         else:
   2696             try:

/Applications/anaconda3/lib/python3.5/site-packages/pandas/core/series.py in name(self, value)
    307     def name(self, value):
    308         if value is not None and not com.is_hashable(value):
--> 309             raise TypeError('Series.name must be a hashable type')
    310         object.__setattr__(self, '_name', value)
    311 

TypeError: Series.name must be a hashable type
Community
  • 1
  • 1
Bob Kraft
  • 13
  • 1
  • 4
  • 1
    @CodeCupboard How could this be a concern? `np.median([1,2])` (even number of values) works as well as `np.median([1,2, 3])` (odd number). – Qaswed Sep 24 '19 at 10:50

1 Answers1

1

Somehow the series name at this stage is being interpreted as un-hashable, despite supposedly being a tuple. I think it may be the same bug as the one fixed and closed:

Basically, single scalar values in groups (as you have in your example) were causing the name of the Series to not be passed through. It is fixed in 0.19.2.


In any case, it shouldn't be a practical concern since you can (and should) call mean, median, etc. on GroupBy objects directly.

>>> df.groupby(['COL1', 'COL2'])[['COL3', 'COL4']].median()
           COL3  COL4
COL1 COL2            
A    AA     2.5   0.5
     BB     4.5   2.5
B    BB     3.0   3.0
miradulo
  • 28,857
  • 6
  • 80
  • 93