0

When converting a column to a type categorical, and setting the some aesthetics property (aes()) to use it, I'm getting the following error:

NotImplementedError: isna is not defined for MultiIndex

For example, here's a reproducible example:

randCat = np.random.randint(0,2,500)
randProj = np.random.rand(1,500)
df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'}) 


df['cat'] = df['cat'].astype('category')
g = ggplot(aes(x='proj', color='cat',fill='cat'), data=df) + geom_density(alpha=0.7)
print(g)

I'm using pandas version 0.22.0. And ggplot 0.11.5

Interestingly enough, the plot comes out fine when I'm not setting the "cond" column to be a "categorical" type (remains as string). However, for different purposes I need this column to categorical.

A more complete trace of the error:

     54     # hack (for now) because MI registers as ndarray
     55     elif isinstance(obj, ABCMultiIndex):
---> 56         raise NotImplementedError("isna is not defined for MultiIndex")
     57     elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass)):
     58         return _isna_ndarraylike(obj)

NotImplementedError: isna is not defined for MultiIndex

Thanks, Eyal.

EyalItskovits
  • 116
  • 1
  • 9
  • 3
    Could you please post sample data? Based on your error, I want to take a particular look at your indices. – ParalysisByAnalysis Sep 24 '19 at 04:34
  • 1
    Please provide a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) & include some data: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Trenton McKinney Sep 28 '19 at 17:08
  • You may want to look at this: https://github.com/has2k1/plotnine/issues/194. Why? My understanding is that plotline does'nt support multilevel dataframes. – Technophobe01 Sep 28 '19 at 19:15
  • A more complete trace on that error would be helpful as well... – mayosten Sep 30 '19 at 03:33
  • Hi, I've added code that reproduces that error. – EyalItskovits Oct 03 '19 at 11:41

2 Answers2

0

It's probably an edge case that causes ggplot in combination with pandas to fail.

Looking at the source code of ggplot, we find at the end of ggploy.py: _construct_plot_data:

groups = [column for _, column in discrete_aes]
if groups:
    return mappers, data.groupby(groups)
else:
    return mappers, [(0, data)]

So my guess is that the category is used for the groupby, which causes pandas to break.

Try casting to object instead of category and in the case of geom_density remove the fill='cat' as this causes the lines and legend to be rendered twice:

randCat = np.random.randint(0,2,500)
randProj = np.random.rand(1,500)
df = pd.DataFrame({'proj': np.ravel(randProj),'cat': np.ravel(randCat)})
df['cat'] = df['cat'].map({0:'firstCat', 1:'secondCat'}) 
df['cat'] = df['cat'].astype('object')

g = ggplot(aes(x='proj', color='cat'), data=df) + geom_density(alpha=0.7)
print(g)

See also http://ggplot.yhathq.com/how-it-works.html and http://ggplot.yhathq.com/docs/geom_density.html

Elwin Arens
  • 1,542
  • 10
  • 21
0

I overcame the "fill" issue using the seaborn package.

import matplotlib.pyplot as plt
import seaborn as sns

sns.kdeplot(df[df['cat'] == 'firstCat']['proj'], shade=True, label='firstCat')
sns.kdeplot(df[df['cat'] == 'secondCat']['proj'], shade=True, label='secondCat')
plt.show()

Plots this

EyalItskovits
  • 116
  • 1
  • 9