I want to make a random sample selection in python from the following df such that at least 65% of the resulting sample should have color yellow and cumulative sum of the quantities selected to be less than or equals to 18.
Original Dataset:
Date Id color qty
02-03-2018 A red 5
03-03-2018 B blue 2
03-03-2018 C green 3
04-03-2018 D yellow 4
04-03-2018 E yellow 7
04-03-2018 G yellow 6
04-03-2018 H orange 8
05-03-2018 I yellow 1
06-03-2018 J yellow 5
I have got total qty. selected condition covered but stuck on how to move forward with integrating the % condition:
df2 = df1.sample(n=df1.shape[0])
df3= df2[df2.qty.cumsum() <= 18]
Required dataset:
Date Id color qty
03-03-2018 B blue 2
04-03-2018 D yellow 4
04-03-2018 G yellow 6
06-03-2018 J yellow 5
Or something like this:
Date Id color qty
02-03-2018 A red 5
04-03-2018 D yellow 4
04-03-2018 E yellow 7
05-03-2018 I yellow 1
Any help would be really appreciated!
Thanks in advance.