Better binning in pandas

Question

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin.

Currently, I'm doing this:

x = 5
y = 17
z = 33
filter_values = [x, y, z]
filtered_a = df[df.filtercol <= x]
a_count = filtered_a.filtercol.count()

filtered_b = df[df.filtercol > x]
filtered_b = filtered_b[filtered_b <= y]
b_count = filtered_b.filtercol.count()

filtered_c = df[df.filtercol > y]
c_count = filtered_c.filtercol.count()

But is there a more concise way to accomplish the same thing?

unutbu · Accepted Answer · 2018-04-17T11:34:22.997

36

Perhaps you are looking for pandas.cut:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(50), columns=['filtercol'])
filter_values = [0, 5, 17, 33]   
out = pd.cut(df.filtercol, bins=filter_values)
counts = pd.value_counts(out)
# counts is a Series
print(counts)

yields

(17, 33]    16
(5, 17]     12
(0, 5]       5

To reorder the result so the bin ranges appear in order, you could use

counts.sort_index()

which yields

(0, 5]       5
(5, 17]     12
(17, 33]    16

Thanks to nivniv and InLaw for this improvement.

See also Discretization and quantiling.

edited Apr 17 '18 at 11:34

answered Jan 22 '13 at 03:46

unutbu

842,883
184
1,785
1,677

1

yes! Is there a way to sort the index/keys of the resulting value_counts() object? – monkut Jan 22 '13 at 04:10
@unutbu counts.reindex(counts.index) won't work for me on another example. does it sort lexicographically? – nivniv Aug 08 '15 at 16:36
No, `reindex` doesn't do any sorting. It relies on the order provided by `counts.index`. Perhaps it would be best to post a new question with a simple example demonstrating the problem and the desired output. – unutbu Aug 08 '15 at 17:30
2

ok I think I figured it out with counts.sort_index(). Thanks though – nivniv Aug 08 '15 at 18:00

Better binning in pandas

1 Answers1

Linked

Related