I have a Pandas DataFrame with 3 columns, target
, pred
, and conf_bin
. If I run a groupby(by='conf_bin').apply(...)
my apply function gets called with empty DataFrame
s for values that do not appear in the conf_bin
column. How is this possible?
Details
The DataFrame looks something like this:
target pred conf_bin
0 5 6 0.50
1 4 4 0.60
2 4 4 0.50
3 4 3 0.50
4 4 5 0.50
5 5 5 0.55
6 5 5 0.55
7 5 5 0.55
Obviously conf_bin
is a numeric bin with values in the range np.arange(0, 1, 0.05)
. However, not all values are present in the data:
In [224]: grp = tp.groupby(by='conf_bin')
In [225]: grp.groups.keys()
Out[225]: dict_keys([0.5, 0.60000000000000009, 0.35000000000000003, 0.75, 0.85000000000000009, 0.65000000000000002, 0.55000000000000004, 0.80000000000000004, 0.20000000000000001, 0.45000000000000001, 0.40000000000000002, 0.30000000000000004, 0.70000000000000007, 0.25])
So, for example, the values 0
and 0.05
do not appear. However, when I run an apply
on the group my function does get called for these values:
In [226]: grp.apply(lambda x: x.shape)
Out[226]:
conf_bin
0.00 (0, 3)
0.05 (0, 3)
0.10 (0, 3)
0.15 (0, 3)
0.20 (22, 3)
0.25 (75, 3)
0.30 (95, 3)
0.35 (870, 3)
0.40 (8505, 3)
0.45 (40068, 3)
0.50 (51238, 3)
0.55 (54305, 3)
0.60 (47191, 3)
0.65 (38977, 3)
0.70 (34444, 3)
0.75 (20435, 3)
0.80 (3352, 3)
0.85 (4, 3)
0.90 (0, 3)
dtype: object
Questions:
- How can Pandas even know that the values 0.0 and 0.5 "make sense" since they don't appear in my
DataFrame
? - Why is it calling my apply function with empty
DataFrame
objects for values that do no appear ingrp.groups
?