0

I want to get a random value from a set. The following is my code which doesn't work.

I get the Error:

File "/usr/lib/python2.7/random.py", line 320, in sample raise ValueError("sample larger than population") ValueError: sample larger than population

I don't have any idea what this means. I want to get an integer, so I can add this integer to a set. So I have N randomly chosen elements from one set put into another.

def getRandomBook():
    bookset = getBookSet()
    random_number = random.sample(bookset,1)
    print random_number[0]
    return_number = random_number[0]
    return return_number


def getBookSet(sales_input=open("data/sales_3yr.csv", "r")):
    sales = csv.reader(sales_input)
    bookID = set()
    lineNumber = 0    
    for line in sales:
        id = line[6]
        if lineNumber<>0:
            bookID.add(eval(id))
        lineNumber=1
    return bookID
Evhz
  • 8,852
  • 9
  • 51
  • 69
Sven Bamberger
  • 899
  • 2
  • 10
  • 21

1 Answers1

2

It means your set is empty, and the set is empty because your getBookSet() function will read from a file object that has already been read from before.

You are opening the file as part of the function definition, which means you can only read it once. Do not use function default parameters for expressions you want to be evaluated each time the function is called. See "Least Astonishment" and the Mutable Default Argument.

Do this instead:

def getBookSet(sales_input="data/sales_3yr.csv"):
    sales = csv.reader(open(sales_input, 'rb'))
    bookID = set()
    lineNumber = 0    
    for line in sales:
        id = line[6]
        if lineNumber<>0:
            bookID.add(eval(id))
        lineNumber=1
    return bookID

Now the function opens the file object each time the function is called and starts reading from the first byte again. The sales_input parameter is now a immutable string, used to open the file object when running the function instead of when defining the function.

Correcting a few other issues:

def getBookSet(sales_input="data/sales_3yr.csv"):
    with open(sales_input, 'rb') as sales_file:
        sales = csv.reader(sales_file)
        next(sales, None)  # skip the first line
        return {int(row[6]) for row in sales}
  • The next() call will pull in that first line for us (the headers) and any further looping over sales will continue from the next line.
  • Don't use <>; it is deprecated. You could test for if lineNumber or if lineNumber > 0 or if lineNumber != 0 instead ( in order of preference).
  • Don't use eval() when int() or float() will do just fine.
  • Use with to close files automatically when done with the block of code.
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343