5

I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :

        productId   sell_prix   categ   popularity
11997   16758760.0  28.75        50      524137.0
11998   16758760.0  28.75        50      166795.0
13154   16782105.0  24.60        50      126890.5
13761   16790082.0  65.00        50      245437.0
13762   16790082.0  65.00        50      245242.0
15355   16792720.0  29.00        50      360219.0
15356   16792720.0  29.00        50      360100.0
15357   16792720.0  29.00        50      360027.0
15358   16792720.0  29.00        50      462850.0
15367   16792728.0  29.00        50      193030.5

And this is my code :

df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)

I have this error message :

**ValueError: Bin edges must be unique: array([ 24.6,  29. ,  29. ,  65. ])**

In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?

Help please ! Many thanks.

Arij SEDIRI
  • 2,088
  • 7
  • 25
  • 43
  • if you sort the df column does it work? `df['PriceBucket'] = pd.qcut(df['sell_prix'].sort_values(), 3)` – EdChum Jul 11 '16 at 14:13
  • there aren't enough unique values in 'sell_prix' in your smaller dataframe to break the range into 3 buckets. Hence the endpoints of the first and middle buckets are the same, which is why you are getting an error – Fortunato Jul 11 '16 at 14:36
  • See http://stackoverflow.com/questions/20158597/how-to-qcut-with-non-unique-bin-edges?rq=1 – dukebody Jul 11 '16 at 14:45

1 Answers1

5

Various solutions are discussed here, but briefly:

> pd.qcut(df['a'].rank(method='first'), 3)
0        [1, 2.333]
1        [1, 2.333]
2    (2.333, 3.667]
3        (3.667, 5]
4        (3.667, 5]

Or

> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0    0
1    0
2    1
3    2
4    2
luca
  • 7,178
  • 7
  • 41
  • 55