19

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin.

Currently, I'm doing this:

x = 5
y = 17
z = 33
filter_values = [x, y, z]
filtered_a = df[df.filtercol <= x]
a_count = filtered_a.filtercol.count()

filtered_b = df[df.filtercol > x]
filtered_b = filtered_b[filtered_b <= y]
b_count = filtered_b.filtercol.count()

filtered_c = df[df.filtercol > y]
c_count = filtered_c.filtercol.count()

But is there a more concise way to accomplish the same thing?

monkut
  • 42,176
  • 24
  • 124
  • 155

1 Answers1

36

Perhaps you are looking for pandas.cut:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(50), columns=['filtercol'])
filter_values = [0, 5, 17, 33]   
out = pd.cut(df.filtercol, bins=filter_values)
counts = pd.value_counts(out)
# counts is a Series
print(counts)

yields

(17, 33]    16
(5, 17]     12
(0, 5]       5

To reorder the result so the bin ranges appear in order, you could use

counts.sort_index()

which yields

(0, 5]       5
(5, 17]     12
(17, 33]    16

Thanks to nivniv and InLaw for this improvement.


See also Discretization and quantiling.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    yes! Is there a way to sort the index/keys of the resulting value_counts() object? – monkut Jan 22 '13 at 04:10
  • @unutbu counts.reindex(counts.index) won't work for me on another example. does it sort lexicographically? – nivniv Aug 08 '15 at 16:36
  • No, `reindex` doesn't do any sorting. It relies on the order provided by `counts.index`. Perhaps it would be best to post a new question with a simple example demonstrating the problem and the desired output. – unutbu Aug 08 '15 at 17:30
  • 2
    ok I think I figured it out with counts.sort_index(). Thanks though – nivniv Aug 08 '15 at 18:00