What's the easiest way to sort evenly distributed values into a predefined number of groups?
data = {'impact':[10,30,20,10,90,60,50,40]}
df = pd.DataFrame(data,index=['a','b','c','d','e','f','g','h'])
print df
impact
a 10
b 30
c 20
d 10
e 90
f 60
g 50
h 40
numgroups = 4
group_targetsum = round(df.impact.sum() / numgroups, -1)
print group_targetsum
80.0
In the case above, I'd like to create 4 groups from df. The only sorting criteria is that the sum of impact in each group should be approximately equal to group_targetsum. impact sum can be above or below group_targetsum within a reasonable margin.
Ultimately, I'd like to separate these groups into their own dataframes, preserving index. Resulting in something like this:
print df_a
impact
e 90
print df_b
impact
c 20
f 60
print df_c
impact
a 10
d 10
g 50
print df_d
impact
b 30
h 40
Resulting dataframes don't need to be exactly this, just as long as they sum as close as possible to group_targetsum.