Pandas returning empty groups in groupby

Question

I have a Pandas DataFrame with 3 columns, target, pred, and conf_bin. If I run a groupby(by='conf_bin').apply(...) my apply function gets called with empty DataFrames for values that do not appear in the conf_bin column. How is this possible?

Details

The DataFrame looks something like this:

        target  pred conf_bin
0            5     6     0.50
1            4     4     0.60
2            4     4     0.50
3            4     3     0.50
4            4     5     0.50
5            5     5     0.55
6            5     5     0.55
7            5     5     0.55

Obviously conf_bin is a numeric bin with values in the range np.arange(0, 1, 0.05). However, not all values are present in the data:

In [224]: grp = tp.groupby(by='conf_bin')

In [225]: grp.groups.keys()
Out[225]: dict_keys([0.5, 0.60000000000000009, 0.35000000000000003, 0.75, 0.85000000000000009, 0.65000000000000002, 0.55000000000000004, 0.80000000000000004, 0.20000000000000001, 0.45000000000000001, 0.40000000000000002, 0.30000000000000004, 0.70000000000000007, 0.25])

So, for example, the values 0 and 0.05 do not appear. However, when I run an apply on the group my function does get called for these values:

In [226]: grp.apply(lambda x: x.shape)
Out[226]:
conf_bin
0.00        (0, 3)
0.05        (0, 3)
0.10        (0, 3)
0.15        (0, 3)
0.20       (22, 3)
0.25       (75, 3)
0.30       (95, 3)
0.35      (870, 3)
0.40     (8505, 3)
0.45    (40068, 3)
0.50    (51238, 3)
0.55    (54305, 3)
0.60    (47191, 3)
0.65    (38977, 3)
0.70    (34444, 3)
0.75    (20435, 3)
0.80     (3352, 3)
0.85        (4, 3)
0.90        (0, 3)
dtype: object

Questions:

How can Pandas even know that the values 0.0 and 0.5 "make sense" since they don't appear in my DataFrame?
Why is it calling my apply function with empty DataFrame objects for values that do no appear in grp.groups?

Can you provide a self-contained example with sample data demonstrating the problem? — BrenBarn, Oct 26 '16 at 19:36
what are the `dtypes`.? Is it possible they are categorical with the information about all the bins in the category spec? — piRSquared, Oct 26 '16 at 19:52
@piRSquared is correct. The dtype of `conf_bin` is `category`. Thanks!! — Oliver Dain, Oct 26 '16 at 19:54
Please refer to https://stackoverflow.com/a/50579578/4755520 for the categorical case. TL;DR use `.groupby(..., observed=True)`. — ayorgo, Feb 25 '20 at 10:16

score 2 · Answer 1 · edited May 14 '20 at 06:33

I too was having this problem, which popped up when trying to create subplots for every category in my dataframe.

I came up with the following workaround (based on this SO post), by pulling out the non-empty groups into a list.

groups = df.groupby('conf_bin')
group_list = [(index, group) for index, group in groups if len(group) > 0]

It does break the implicit contract that "you wrangle your data in pandas", and probably mismanages memory, but it works.

Now you can iterate through your groupby list with the same interface as with a groupby object, e.g.

fig, axes = plt.subplots(nrows=len(group_list), ncols=1)
for (index, group), ax in zip(group_list, axes.flatten()):
    group['target'].plot(ax=ax, title=index)

score 0 · Answer 2 · answered Jun 12 '23 at 18:47

0

Your grouping column is of categorical type and has information about additional possible groups not in your data.

answered Jun 12 '23 at 18:47

Iurii Shcherbak

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 13 '23 at 10:28

Pandas returning empty groups in groupby

2 Answers2