0

I need to find thresholds of bins (for ex. 0-999, 1000-1999 etc.), so that on each bin there was approximately an equal amount (1/n of total value, for ex 1/3 if we split into 3 bins).

d = {'amount': [600,400,250,340,200,500,710]}
df = pd.DataFrame(data=d)
df

amount
600
400
250
340
200
500
710

expected output if we split into 3 bins by sum of amount column:

bin                          sum
threshold_1(x value-x value) 1000
threshold_2(x-x)             1000
threshold_3(x-x)             1000

something like this, but i need sum value instead of count

pd.cut(amount, 3).value_counts()

maybe it could be solved in python, not only via pandas?

Kurasao
  • 75
  • 6
  • 1
    I don't understand what your question is. You want to create equal sized bins from the sum of all your values, than you have your answer: sum up all your values and divided by the number of bins, than you have the size of your bins. – Bastian Apr 12 '21 at 08:37
  • @Bastian edited, the question is to find threshold values of equal bins – Kurasao Apr 12 '21 at 08:40

1 Answers1

1

If need approximately an equal amount aggregate sum with pd.cut:

df = df.groupby(pd.cut(df.amount, 3)).sum()
print (df)
                 amount
amount                 
(199.49, 370.0]     790
(370.0, 540.0]      900
(540.0, 710.0]     1310
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • almost looks what i nedd, but is it possible to split more equal amouts? because on main df i've got not so good result: 1455511239 in first bin vs 29959759 in last – Kurasao Apr 12 '21 at 09:08
  • 1
    @Kurasao - I think not in my opinion, if need use some pandas methods. – jezrael Apr 12 '21 at 09:10
  • 1
    @Kurasao - in python it is possible, but not so easy - [this](https://stackoverflow.com/questions/42271009/subsets-having-same-sum-python) – jezrael Apr 12 '21 at 09:19