How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks?

Question

It is relatively easy to manually give a list of bins when plotting an histogram with matplotlib, as shown for example here.

A simple example of this is the following:

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.hist(np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
ax.hist(0.2 * np.random.randn(10000), bins=np.arange(-4, 4, 0.1))
plt.show()

This can also be equivalently done from a pandas.DataFrame with:

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
}).plot(kind='hist', bins=np.arange(-4, 4, 0.1))

Going further, plotly allows to directly interface to pandas through the cufflinks module, which allows to do things like the following:

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
}).iplot(kind='hist', bins=100)

But here is the catch: the iplot method provided by cufflinks does not seem to accept a list for bins. When a number is provided like in the above example, that number is used to bin independently both datasets, which results in unequal binning, with potentially misleading results (see the equal heights in the above plot).

While this effect can be somewhat mitigated using the histnorm='density' option, one may want to see the counts per bin and not a density.

Is there a way around this?

score 5 · Accepted Answer · answered Aug 21 '17 at 04:24

5

I have added an update for this. You should be now able specify bins=(start,end,size)

pd.DataFrame({
'firstHistogram': np.random.randn(10000),
'secondHistogram': 0.2 * np.random.randn(10000)}).iplot(kind='hist',bins=(-4,4,.08))

Should now return: Custom bins

answered Aug 21 '17 at 04:24

jorge.santos

291
1
2
4

Perfect, thanks so much! That documentation on plotly should be updated to reflect this functionality: https://plot.ly/ipython-notebooks/cufflinks/ – James Paul Mason Apr 03 '19 at 18:51
I've submitted a pull request to do so: https://github.com/plotly/documentation/pull/1295 – James Paul Mason Apr 03 '19 at 19:05
This functionality is still not present as I can see – Andrea Mar 03 '20 at 14:56

score 2 · Answer 2 · answered Aug 20 '17 at 20:06

As far as I know there is no direct way of doing it in cufflinks. The output shown in your code is wrong in my opinion, i.e. I think that's a bug in cufflinks.

But you can easily imitate the cufflinks function with a few lines of code. You can get the same layout with cufflinks.getLayout() and just need to set barmode to overlay.

import pandas as pd
import plotly
import cufflinks

plotly.offline.init_notebook_mode()

pd.DataFrame({
    'firstHistogram': np.random.randn(10000),
    'secondHistogram': 0.2 * np.random.randn(10000)
})

data = list()

for dd in df:
    histo = plotly.graph_objs.Histogram(x=df[dd], 
                                        name=dd,
                                        xbins={'start': -4, 'end': 4, 'size': 0.08},
                                        autobinx=False, 
                                        opacity=0.8
                                       )
    data.append(histo)
layout = plotly.graph_objs.Layout(cufflinks.getLayout(), 
                                  barmode='overlay')
fig = plotly.graph_objs.Figure(data=data, 
                               layout=layout)
plotly.offline.iplot(fig)

How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks?

2 Answers2