2

I feel like I could not search this as properly, as I usually do, since I do not know for sure what this element (n) was turned into [n].

I believe it is a dict key but I am not sure.

I entered the following code:

def get_quantile_count(group, q=0.5):
    group=group.sort_index(by='prop', ascending=False)
    return group.prop.cumsum().searchsorted(q) + 1
diversity= top1000.groupby(['year','sex']).apply(get_quantile_count)
diversity=diversity.unstack('sex')

top1000.head() returns:

       name sex  births  year      prop
year sex                                         
1880 F   0       Mary   F    7065  1880  0.077643
         1       Anna   F    2604  1880  0.028618
         2       Emma   F    2003  1880  0.022013
         3  Elizabeth   F    1939  1880  0.021309
         4     Minnie   F    1746  1880  0.019188

and diversity.head() returns

sex      F     M
year            
1880  [38]  [14]
1881  [38]  [14]
1882  [38]  [15]
1883  [39]  [15]
1884  [39]  [16]

I believe that since the values are in [brackets] when I run diversity.plot(title='stuff') I get the error :

TypeError: Empty 'DataFrame': no numeric data to plot

Where are these square brackets creeping in?

LondonRob
  • 73,083
  • 37
  • 144
  • 201
lost
  • 377
  • 2
  • 14

1 Answers1

1

It's to do with searchsorted which your function get_quantile_count calls when returning.

It always returns a list. See this example from the docs:

>>> x = pd.Series([1, 2, 3])
>>> x
0    1
1    2
2    3
dtype: int64
>>> x.searchsorted(4)
array([3])

As a rather unsatisfying workaround to this fact, you could just change the last line of the function to:

return group.prop.cumsum().searchsorted(q)[0] + 1

Or, if you don't want to edit the function for some reason, you can always extract the first element of each of those lists with applymap which applies an arbitrary function to each element of a DataFrame:

In []: df
Out[]: 
         F     M
year            
1880  [38]  [14]
1881  [38]  [14]
1882  [38]  [15]
1883  [39]  [15]
1884  [39]  [16]
In []: df.applymap(lambda x: x[0])
Out[]: 
       F   M
year        
1880  38  14
1881  38  14
1882  38  15
1883  39  15
1884  39  16
Community
  • 1
  • 1
LondonRob
  • 73,083
  • 37
  • 144
  • 201