Pandas groupby how to compute counts in ranges

Question

Say I have a huge list of numbers between 0 and 100. I compute ranges, depending on the max number and then saying there are 10 bins. So my ranges are for example

ranges = [0,10,20,30,40,50,60,70,80,90,100]

Now I count the occurances in each range from 0-10, 10-20, and so on. I iterate over every number in the list and check for a range. I assume this is not the best way in terms of runtime speed.

Can I fasten it up by using pandas, e.g. pandas.groupby, and how?

score 40 · Accepted Answer · edited Jul 04 '20 at 09:19

We can use pd.cut to bin the values into ranges, then we can groupby these ranges, and finally call count to count the values now binned into these ranges:

np.random.seed(0)
df = pd.DataFrame({"a": np.random.random_integers(1, high=100, size=100)})
ranges = [0,10,20,30,40,50,60,70,80,90,100]
df.groupby(pd.cut(df.a, ranges)).count()

            a
a            
(0, 10]    11
(10, 20]   10
(20, 30]    8
(30, 40]   13
(40, 50]   11
(50, 60]    9
(60, 70]   10
(70, 80]   11
(80, 90]   13
(90, 100]   4

score 35 · Answer 2 · answered Jul 04 '20 at 09:39

Surprised I haven't seen this yet, so without further ado, here is

`.value_counts(bins=N)`

Computing bins with pd.cut followed by a groupBy is a 2-step process. value_counts allows you a shortcut using the bins argument:

# Uses Ed Chum's setup. Cross check our answers match!
np.random.seed(0)
df = pd.DataFrame({"a": np.random.random_integers(1, high=100, size=100)})

df['a'].value_counts(bins=10, sort=False)

(0.9, 10.9]      11
(10.9, 20.8]     10
(20.8, 30.7]      8
(30.7, 40.6]     13
(40.6, 50.5]     11
(50.5, 60.4]      9
(60.4, 70.3]     10
(70.3, 80.2]     11
(80.2, 90.1]     13
(90.1, 100.0]     4
Name: a, dtype: int64

This creates 10 evenly-spaced right-closed intervals and bincounts your data. sort=False will be required to avoid value_counts ordering the result in decreasing order of count.

Binning by Unequal Ranges

For this, you can pass a list to bins argument:

bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df['a'].value_counts(bins=bins, sort=False) 

(-0.001, 10.0]    11
(10.0, 20.0]      10
(20.0, 30.0]       8
(30.0, 40.0]      13
(40.0, 50.0]      11
(50.0, 60.0]       9
(60.0, 70.0]      10
(70.0, 80.0]      11
(80.0, 90.0]      13
(90.0, 100.0]      4
Name: a, dtype: int64

Is there a way to do this with right-open and left-closed bins, or any variant of that? — Prithvi Thakur, Nov 04 '22 at 17:50

Pandas groupby how to compute counts in ranges

2 Answers2

`.value_counts(bins=N)`

Binning by Unequal Ranges

Linked

Related