0

I have a large DataFrame that has name and cost (£) columns. Is there a way to randomly select say 10 rows from this DataFrame that have a combined cost of no more than £100 for instance?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • 1
    you can refer this [question](https://stackoverflow.com/questions/55111741/python-random-sample-selection-based-on-multiple-conditions). It deals with the similar issue. – Arjun Sikhwal Sep 17 '20 at 14:59
  • please post sample data and desired output and what you have tried so far – Vivek Kalyanarangan Sep 17 '20 at 17:50
  • 1
    Does this answer your question? [Random row selection in Pandas dataframe](https://stackoverflow.com/questions/15923826/random-row-selection-in-pandas-dataframe) – bart cubrich Sep 17 '20 at 20:15
  • Actually this is not a duplicate because of the combined cost stipulation. – bart cubrich Sep 17 '20 at 21:29
  • I think the title should be edited to something like`" How do you select random rows from a pandas DataFrame with column sum constraints"` to differentiate it from [this](https://stackoverflow.com/questions/55111741/python-random-sample-selection-based-on-multiple-conditions). I think this question is different, but the title makes it seem the same. Edit queue full or I would suggest it. – bart cubrich Sep 19 '20 at 19:49

1 Answers1

3

It can be done like so, although you might want to set some conditions around the while loop like a maximum number of iterations, since you can easily run into a situation where no subset of your df will have a combined cost of less then 100 or whatever.

Edited on OP request

import pandas as pd
import numpy as np
df=pd.DataFrame({'cost':np.random.uniform(0,10,100)})

n=12 #number of rows to get (I had to do 12 not get infinite loop with my data)
cost_max=100
cost_min=90
cost=80
max_iterations=1000
i=1

while True:
    
    
    df_random = df.sample(n)
    cost=df_random.cost.sum()
    #print(cost)
    
    if cost_min<cost<cost_max:
        break
    
    i+=1
    if i>max_iterations:
        #print('max iterations ({}) reached'.format(max_iterations))
        break
bart cubrich
  • 1,184
  • 1
  • 14
  • 41
  • Brilliant thanks thats working perfectly. I have now also incorperated an if statement into the while loop that says if cost<90: cost=cost_max+1 to make a minimum cost of the items. Is this the most efficient way to set up a minimum constraint in you opinion? Thanks for your help with everything Im very new to the community – Jamiephilpott88 Sep 18 '20 at 10:23
  • I'm not sure I understand what that is doing? The minimum cost is set by changing the `cost_max` variable. Are you saying you also want the cost to be greater than 90? I can edit the script to allow that as well if you want. Also, consider up voting and checking answered for this question if I answered it ;) – bart cubrich Sep 18 '20 at 17:25
  • Basically I want the cost to be between 90 and 100. The if statement was meant to say that if the selection of n rows has a cost of under 900, reset cost to cost_max+1 so the while loop isnt broken - I have tried to upvote but it says that as I am a new member it wont show apologies – Jamiephilpott88 Sep 19 '20 at 14:11
  • I think this is what you want. I have some commented out print statements you can use there for debugging. Notice that I added a max iterations thing as well, so you don't create an infinite loop. – bart cubrich Sep 19 '20 at 19:44