Pandas groupby plot gives first plot twice

Question

I have a dataframe with several categories and I want to use groupby to plot each category individually. However, the first category (or the first plot) is always plotted twice.

For example:

    import pandas as pd
    import numpy as np 
    import matplotlib.pyplot as plt

    n = 100000
    x = np.random.standard_normal(n)
    y1 = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
    y2 = 1.0 + 5.0 * x + 2.0 * np.random.standard_normal(n)

    df1 = pd.DataFrame({"A": x,
                        "B": y1})

    df2 =  pd.DataFrame({"A": x,
                         "B": y2})

    df1["Cat"] = "Cat1"
    df2["Cat"] = "Cat2"

    df = df1.append(df2, ignore_index=True)

    df.groupby("Cat").plot.hexbin(x="A", y="B",cmap = "jet")
    plt.show()

This will give me three plots, where Cat1 is plotted twice.

I just want two plots. What am I doing wrong?

this maybe related to the `apply` twice implementation: http://stackoverflow.com/questions/21390035/python-pandas-groupby-object-apply-method-duplicates-first-group, to work around this I'd get the individual groups and iterate over the groups and plot them — EdChum, Aug 18 '16 at 07:46
Ok, thanks a lot! I was going to avoid iterating over the groups, but if this is normal behavior then it's ok — petetheat, Aug 18 '16 at 08:02

score 1 · Accepted Answer · answered Aug 18 '16 at 07:45

This is expected behaviour, see the warning in the docs:

Warning: In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In your case, the plot function is called twice, which is visible in the result.

Pandas groupby plot gives first plot twice

1 Answers1