pandas
has a pd.qcut
method which, when applied to a Series
, returns a categorical series (a DataFrame
behaves similarly). So to get back a categorical index, you can do:
>>> import pandas as pd
# mock data
>>> df = pd.DataFrame({'sales_total': [1,2,162,126,126,12,7,1236,16,132,61,51]})
>>> cat_srs = pd.qcut(df['sales_total'], 10)
>>> print(cat_srs)
<<< 0 [1, 2.5]
1 [1, 2.5]
2 (159, 1236]
3 (1, 126]
4 (1, 126]
5 (8, 13.2]
6 (2.5, 8]
7 (159, 1236]
8 (13.2, 3]
9 (130.8, 159]
10 (56, 1]
11 (3, 56]
Name: dat, dtype: category
Categories (10, object): [...]
You can get the series entries with the values
operator, and then from there get their bin code using the codes
operator:
>>> deciles = cat_srs.values.codes
>>> print(deciles)
<<< array([0, 0, 9, 6, 6, 2, 1, 9, 3, 8, 5, 4], dtype=int8)
Which is what you need. From here you could assign the deciles to the data using df['decile'] = deciles
, group entries using df.groupby('decile')
, and so on.
The one liner for all of the above is pd.qcut(df['sales_total'], 10).values.codes
.
Edit: answering the modified question below, per the comments—I don't know a way of doing this that's baked into a library. But assuming your data is relatively continuous, you can build classes yourself like so:
target = df['sales_total'].sum() / 10
deciles = []
sum = 0
classifier = 0
for val in df['sales_total']:
deciles.append(classifier)
sum += val
if sum > target:
classifier += 1
sum = 0