I am working on a pandas data frame and I am trying to subset my data frame such that the cumulative sum of the column is not greater than 18 and then the percentage of yellow color selected should not be less than 65% and then trying to run multiple iterations of the same. However sometimes loop goes into infinite loop and sometime it does produce the results but we get the same result in every iteration.
Everything after the while loop was taken from the below post Python random sample selection based on multiple conditions
df=pd.DataFrame({'id':['A','B','C','D','E','G','H','I','J','k','l','m','n','o'],'color':['red','red','orange','red','red','red','red','yellow','yellow','yellow','yellow','yellow','yellow','yellow'], 'qty':[5,2, 3, 4, 7, 6, 8, 1, 5,2, 3, 4, 7, 6]})
df_sample = df
for x in range(2):
sample_s = df.sample(n=df.shape[0])
sample_s= sample_s[(sample_s.qty.cumsum()<= 30)]
sample_size=len(sample_s)
while sum(df['qty']) > 18:
yellow_size = 0.65
df_yellow = df[df['color'] == 'yellow'].sample(int(yellow_size*sample_size))
others_size = 1 - yellow_size
df_others = df[df['color'] != 'yellow'].sample(int(others_size*sample_size))
df = pd.concat([df_yellow, df_others]).sample(frac=1)
print df
This is how I get the result when it works wherein both the results are same.
color id qty
red H 2
yellow n 3
yellow J 5
red G 2
yellow I 1
red D 4
color id qty
red H 2
yellow n 3
yellow J 5
red G 2
yellow I 1
red D 4
I am really hoping if someone could please help to resolve the issue.