Split pandas series into buckets with qcut

Question

I'm trying to split a series into buckets of almost the same size keeping the order and without having same items in different buckets.
I'm using qcut like this:

>>> import pandas as pd 
>>> pd.__version__
'0.20.3'
>>> x = [1,1,1,1,1,2,2,2,2,3,4]
>>> pd.qcut(x, 10, duplicates='drop').value_counts()
(0.999, 2.0]    9
(2.0, 3.0]      1
(3.0, 4.0]      1
dtype: int64

I was expecting this to split the first bucket into (0.999, 1.0], (1.0, 2.0].
Why not? Any other approach I should try?

score 0 · Answer 1 · answered Oct 25 '17 at 15:47

0

By using cut specific your own interval

pd.cut(x, [0.999,1,2]).value_counts()
Out[242]: 
(0.999, 1.0]    5
(1.0, 2.0]      4
dtype: int64

answered Oct 25 '17 at 15:47

BENY

317,841
20
164
234

But I don't know the intervals (the `x` from the question is a simple example, the real `x` is much bigger) – pomber Oct 25 '17 at 17:09
@pomber you can check this link https://stackoverflow.com/questions/30211923/what-is-the-difference-between-pandas-qcut-and-pandas-cut – BENY Oct 25 '17 at 17:12

score 0 · Answer 2 · answered Nov 21 '17 at 09:41

Try pd.cut option like below :

pd.cut(x, 3).value_counts()

(0.997, 2.0]    9
(2.0, 3.0]      1
(3.0, 4.0]      1

Play around with the number of bins you provide.Here I have provided 3 bins. So it had splitted into (0.997,2), (2,3), (3,4).

If you want the bin value to be specified by you then mention the bin values manually like below :

bins = [0.999, 1.0, 2.0, 3.0, 4.0]
pd.cut(x, bins).value_counts()

(0.999, 1.0]    5
(1.0, 2.0]      4
(2.0, 3.0]      1
(3.0, 4.0]      1

Hope this helps.

Split pandas series into buckets with qcut

2 Answers2