5

I have a simple pandas DataFrame, for which I would like to create a mosaic plot. Here is my code:

import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic 

mydata = pd.DataFrame({'id2': {64: 'Angelica', 
                               65: 'DXW_UID', 66: 'casuid01', 
                               67: 'casuid01', 68: 'EC93_uid', 
                               69: 'EC93_uid', 70: 'EC93_uid', 
                               60: 'DXW_UID',  61: 'AtmosFox', 
                               62: 'DXW_UID', 63: 'DXW_UID'}, 
                       'id1': {64: 'TGP', 
                               65: 'Retention01', 66: 'default',
                               67: 'default', 68: 'Musa_EC_9_3', 
                               69: 'Musa_EC_9_3', 70: 'Musa_EC_9_3', 
                               60: 'default', 61: 'default', 
                               62: 'default', 63: 'default'}})

mydata
            id1       id2
60      default   DXW_UID
61      default  AtmosFox
62      default   DXW_UID
63      default   DXW_UID
64          TGP  Angelica
65  Retention01   DXW_UID
66      default  casuid01
67      default  casuid01
68  Musa_EC_9_3  EC93_uid
69  Musa_EC_9_3  EC93_uid
70  Musa_EC_9_3  EC93_uid

[11 rows x 2 columns]

I can create a mosaic plot just fine when I exclude row 64.

mosaic(mydata[mydata.id1!='TGP'], ['id1','id2'])
(<matplotlib.figure.Figure object at 0x11E0D3B0>, OrderedDict([(('default', 'DXW_UID'), (0.0, 0.0, 0.594059405940594, 0.49504950495049505)), (('default', 'AtmosFox'), (0.0, 0.49834983498349833, 0.594059405940594, 0.16501650165016499)), (('default', 'casuid01'), (0.0, 0.66666666666666663, 0.594059405940594, 0.33003300330033009)), (('default', 'EC93_uid'), (0.0, 1.0, 0.594059405940594, 0.0)), (('Retention01', 'DXW_UID'), (0.599009900990099, 0.0, 0.09900990099009899, 0.99009900990099009)), (('Retention01', 'AtmosFox'), (0.599009900990099, 0.99339933993399343, 0.09900990099009899, 0.0)), (('Retention01', 'casuid01'), (0.599009900990099, 0.99669966996699666, 0.09900990099009899, 0.0)), (('Retention01', 'EC93_uid'), (0.599009900990099, 1.0, 0.09900990099009899, 0.0)), (('Musa_EC_9_3', 'DXW_UID'), (0.7029702970297029, 0.0, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'AtmosFox'), (0.7029702970297029, 0.0033003300330033004, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'casuid01'), (0.7029702970297029, 0.0066006600660066007, 0.29702970297029707, 0.0)), (('Musa_EC_9_3', 'EC93_uid'), (0.7029702970297029, 0.0099009900990099011, 0.29702970297029707, 0.99009900990099009))]))

The plot comes out fine (with the exception of some of the labels looking a little funny--but that's not the issue).

The errors occur when I include row 64. My questions are, why does this row cause this error, and how can I fix it? I can see that the error occurs when trying to draw the image, but it is not at all obvious where the NaN is coming from, especially since the plot before worked just fine.

mosaic(mydata, ['id1','id2'])
(<matplotlib.figure.Figure object at 0x11D13ED0>, OrderedDict([(('default', 'DXW_UID'), (0.0, 0.0, 0.5373936408419167, 0.49342105263157893)), (('default', 'AtmosFox'), (0.0, 0.49671052631578938, 0.5373936408419167, 0.16447368421052627)), (('default', 'casuid01'), (0.0, 0.66447368421052622, 0.5373936408419167, 0.32894736842105265)), (('default', 'Angelica'), (0.0, 0.99671052631578938, 0.5373936408419167, 0.0)), (('default', 'EC93_uid'), (0.0, 1.0, 0.5373936408419167, 0.0)), (('TGP', 'DXW_UID'), (0.5423197492163009, 0.0, 0.08956560680698614, 0.0)), (('TGP', 'AtmosFox'), (0.5423197492163009, 0.0032894736842105261, 0.08956560680698614, 0.0)), (('TGP', 'casuid01'), (0.5423197492163009, 0.0065789473684210523, 0.08956560680698614, 0.0)), (('TGP', 'Angelica'), (0.5423197492163009, 0.0098684210526315784, 0.08956560680698614, 0.98684210526315785)), (('TGP', 'EC93_uid'), (0.5423197492163009, 1.0, 0.08956560680698614, 0.0)), (('Retention01', 'DXW_UID'), (0.6368114643976712, 0.0, 0.08956560680698614, 0.98684210526315785)), (('Retention01', 'AtmosFox'), (0.6368114643976712, 0.99013157894736836, 0.08956560680698614, 0.0)), (('Retention01', 'casuid01'), (0.6368114643976712, 0.99342105263157876, 0.08956560680698614, 0.0)), (('Retention01', 'Angelica'), (0.6368114643976712, 0.99671052631578938, 0.08956560680698614, 0.0)), (('Retention01', 'EC93_uid'), (0.6368114643976712, 1.0, 0.08956560680698614, 0.0)), (('Musa_EC_9_3', 'DXW_UID'), (0.7313031795790416, 0.0, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'AtmosFox'), (0.7313031795790416, 0.0032894736842105261, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'casuid01'), (0.7313031795790416, 0.0065789473684210523, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'Angelica'), (0.7313031795790416, 0.0098684210526315784, 0.2686968204209583, 0.0)), (('Musa_EC_9_3', 'EC93_uid'), (0.7313031795790416, 0.013157894736842105, 0.2686968204209583, 0.98684210526315785))]))

When I run the above, I get this Traceback:

  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4.py", line 374, in idle_draw
    self.draw()
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 154, in draw
    FigureCanvasAgg.draw(self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\figure.py", line 1034, in draw
    func(*args)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 2086, in draw
    a.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 1096, in draw
    tick.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 241, in draw
    self.label1.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\text.py", line 598, in draw
    ismath=ismath, mtext=self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 188, in draw_text
    font.get_image(), np.round(x - xd), np.round(y + yd) + 1, angle, gc)
ValueError: cannot convert float NaN to integer
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4.py", line 299, in resizeEvent
    self.draw()
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_qt4agg.py", line 154, in draw
    FigureCanvasAgg.draw(self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 451, in draw
    self.figure.draw(self.renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\figure.py", line 1034, in draw
    func(*args)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 2086, in draw
    a.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 1096, in draw
    tick.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axis.py", line 241, in draw
    self.label1.draw(renderer)
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\text.py", line 598, in draw
    ismath=ismath, mtext=self)
  File "C:\Python27\lib\site-packages\matplotlib\backends\backend_agg.py", line 188, in draw_text
    font.get_image(), np.round(x - xd), np.round(y + yd) + 1, angle, gc)
ValueError: cannot convert float NaN to integer

I ran the above code in the spyder IDE, with default settings.

A similar issue was addressed here, and numerical underflow was the culprit. However, if that is the case here, it is not at all obvious why.

DanDy
  • 451
  • 5
  • 10
  • 1
    This is very interesting. It's a problem with `matplotlib` rendering the labels, nothing to do with the data. To see this, try `mosaic(mydata.replace({'Angelica':'Angelico'}), ['id1', 'id2'])` which should work fine. – LondonRob Jun 24 '15 at 16:41
  • I need some help debugging this: in `matplotlib/text.py`, there's a `Text` object whose `self._transform` has some `nan` values in one of its `Bbox`s. I'm out of my depth. – LondonRob Jun 24 '15 at 17:05
  • 1
    The problem seems to be caused by axes labels because setting `axes_labels=False` _"solves"_ the problem: `mosaic(mydata, ['id1','id2'], axes_label=False)` – Primer Jun 24 '15 at 17:20
  • I get some strange behavior. If I copy the example I get the exception. After replacing the last `a` in 'Angelica' by an `a` (same letter), I don't get an exception. I'm using python 3.4. Aside: there is a RuntimeWarning for the labels because of the dimension that should be fixed soon. – Josef Jun 24 '15 at 17:51
  • If I run this repeatedly, then I get the exception every once in a while. My guess is now that it's a floating point issue in calculating the label coordinates (or something like that). – Josef Jun 24 '15 at 17:56
  • 1
    When I use the PR with the correction, then I don't get the exception in all my tries. https://github.com/statsmodels/statsmodels/pull/2286 – Josef Jun 24 '15 at 18:00

1 Answers1

6

According to the docs the first parameter should be a contingency table. The fact that your way of doing things works at all seems to be an undocumented feature.

The behaviour you're seeing (including your "funny" looking labels) is because many of the entries in your contingency table are zero, and something in the labelling code of mosiac is having a hard time with that.

To see this, convert your DataFrame to a contingency table:

In [161]: pd.crosstab(mydata.id1, mydata.id2)
Out[161]: 
id2          Angelica  AtmosFox  DXW-UID  EC93-uid  casuid01
id1                                                         
Musa-EC-9-3         0         0        0         3         0
Retention01         0         0        1         0         0
TGP                 1         0        0         0         0
default             0         1        3         0         2

And add a "little bit" to all those zeros. The mosiac then works fine.

In [165]: ct = pd.crosstab(mydata.id1, mydata.id2)
In [166]: ctplus = ct + 1
In [167]: mosaic(ctplus.unstack())

Which results in the rather beautiful: Beautiful mosaic plot

The tiny downside is that it's wrong! But you can remedy that by doing

ctplus = ct + 1e-8

to just add a tiny bit to all those zeros. The plot still works (but looks ugly because the labels on all those zero tiles of the mosaic are all on top of each other):

A much uglier mosaic plot

LondonRob
  • 73,083
  • 37
  • 144
  • 201
  • Thanks LondonRob! I'm still not sure why `'Angelica'` was being so rude :), but the contingency table way solved the problem for me. I enjoyed seeing your way of tweaking the aesthetics too. The docs do suggest support for using DataFrames (see the very last paragraph in your link): "Using a DataFrame as source, specifying the name of the columns of interest >>> gender = [‘male’, ‘male’, ‘male’, ‘female’, ‘female’, ‘female’] >>> pet = [‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’] >>> data = pandas.DataFrame({‘gender’: gender, ‘pet’: pet}) >>> mosaic(data, [‘pet’, ‘gender’]) >>> pylab.show()". – DanDy Jun 25 '15 at 18:11
  • @LondonRob, very useful answer indeed. Is there some way to remove the text overlays? – miraculixx Jul 15 '16 at 20:47