2

I have a dataframe with several categories and I want to use groupby to plot each category individually. However, the first category (or the first plot) is always plotted twice.

For example:

    import pandas as pd
    import numpy as np 
    import matplotlib.pyplot as plt

    n = 100000
    x = np.random.standard_normal(n)
    y1 = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
    y2 = 1.0 + 5.0 * x + 2.0 * np.random.standard_normal(n)

    df1 = pd.DataFrame({"A": x,
                        "B": y1})

    df2 =  pd.DataFrame({"A": x,
                         "B": y2})

    df1["Cat"] = "Cat1"
    df2["Cat"] = "Cat2"

    df = df1.append(df2, ignore_index=True)

    df.groupby("Cat").plot.hexbin(x="A", y="B",cmap = "jet")
    plt.show()

This will give me three plots, where Cat1 is plotted twice.

I just want two plots. What am I doing wrong?

petetheat
  • 89
  • 1
  • 6
  • 1
    this maybe related to the `apply` twice implementation: http://stackoverflow.com/questions/21390035/python-pandas-groupby-object-apply-method-duplicates-first-group, to work around this I'd get the individual groups and iterate over the groups and plot them – EdChum Aug 18 '16 at 07:46
  • Ok, thanks a lot! I was going to avoid iterating over the groups, but if this is normal behavior then it's ok – petetheat Aug 18 '16 at 08:02

1 Answers1

1

This is expected behaviour, see the warning in the docs:

Warning: In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In your case, the plot function is called twice, which is visible in the result.

Mathias711
  • 6,568
  • 4
  • 41
  • 58