I have a large DataFrame that has name and cost (£) columns. Is there a way to randomly select say 10 rows from this DataFrame that have a combined cost of no more than £100 for instance?
Asked
Active
Viewed 761 times
0
-
1you can refer this [question](https://stackoverflow.com/questions/55111741/python-random-sample-selection-based-on-multiple-conditions). It deals with the similar issue. – Arjun Sikhwal Sep 17 '20 at 14:59
-
please post sample data and desired output and what you have tried so far – Vivek Kalyanarangan Sep 17 '20 at 17:50
-
1Does this answer your question? [Random row selection in Pandas dataframe](https://stackoverflow.com/questions/15923826/random-row-selection-in-pandas-dataframe) – bart cubrich Sep 17 '20 at 20:15
-
Actually this is not a duplicate because of the combined cost stipulation. – bart cubrich Sep 17 '20 at 21:29
-
I think the title should be edited to something like`" How do you select random rows from a pandas DataFrame with column sum constraints"` to differentiate it from [this](https://stackoverflow.com/questions/55111741/python-random-sample-selection-based-on-multiple-conditions). I think this question is different, but the title makes it seem the same. Edit queue full or I would suggest it. – bart cubrich Sep 19 '20 at 19:49
1 Answers
3
It can be done like so, although you might want to set some conditions around the while loop like a maximum number of iterations, since you can easily run into a situation where no subset of your df will have a combined cost of less then 100
or whatever.
Edited on OP request
import pandas as pd
import numpy as np
df=pd.DataFrame({'cost':np.random.uniform(0,10,100)})
n=12 #number of rows to get (I had to do 12 not get infinite loop with my data)
cost_max=100
cost_min=90
cost=80
max_iterations=1000
i=1
while True:
df_random = df.sample(n)
cost=df_random.cost.sum()
#print(cost)
if cost_min<cost<cost_max:
break
i+=1
if i>max_iterations:
#print('max iterations ({}) reached'.format(max_iterations))
break

bart cubrich
- 1,184
- 1
- 14
- 41
-
Brilliant thanks thats working perfectly. I have now also incorperated an if statement into the while loop that says if cost<90: cost=cost_max+1 to make a minimum cost of the items. Is this the most efficient way to set up a minimum constraint in you opinion? Thanks for your help with everything Im very new to the community – Jamiephilpott88 Sep 18 '20 at 10:23
-
I'm not sure I understand what that is doing? The minimum cost is set by changing the `cost_max` variable. Are you saying you also want the cost to be greater than 90? I can edit the script to allow that as well if you want. Also, consider up voting and checking answered for this question if I answered it ;) – bart cubrich Sep 18 '20 at 17:25
-
Basically I want the cost to be between 90 and 100. The if statement was meant to say that if the selection of n rows has a cost of under 900, reset cost to cost_max+1 so the while loop isnt broken - I have tried to upvote but it says that as I am a new member it wont show apologies – Jamiephilpott88 Sep 19 '20 at 14:11
-
I think this is what you want. I have some commented out print statements you can use there for debugging. Notice that I added a max iterations thing as well, so you don't create an infinite loop. – bart cubrich Sep 19 '20 at 19:44