Create Population Data

Question

Create a population of 5000 individuals and their number of phone pick-ups per day to be used for later sampling. Here's the code I came up with:

def get_population(pickups, pop_size, std):
    pop = np.random.randint(0, pickups, 5000)
    mean = np.mean(pop)
    std = np.std(pop)
    return pop, mean, std

I used the given assertion errors to come up with the function, so I'm not even sure what I should be returning:

pop_pickups, pop_mean, pop_std = get_population(45, 5000, 42)
assert np.abs(pop_mean - 45) < 0.5, "Get population problem, testing pop_mean, population mean returned does not match expected population mean"
assert np.abs(pop_std - np.sqrt(45)) < 0.5, "Get population problem, testing pop_std, population standard deviation returned does not match expected standard deviation"

I assumed I needed to generate the population, then take the mean and std of said pop. But my code triggered the assertion error for incorrect mean. The ultimate goal is to visualize the bootstrap of a single mean.

lbrissot · Accepted Answer · 2021-09-01T15:44:12.683

1

Check the documentation of numpy.random.randint : https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html

You're generating 5000 points between 0 and 45. The mean will likely be around (45 - 0)/2 not around 45. Your code is OK but the assert tests are not checking for the proper mean and STD.

Edit: If you want to generate a sample following a Poisson distribution, you can use this:

import numpy as np

param = 45  # Poisson parameter
nb_samples = 50000
sample = np.random.poisson(param, nb_samples)

assert np.abs(param - np.mean(sample)) < 0.5, "Get population problem, testing pop_mean, population mean returned does not match expected population mean"

Note that this code will run without returning anything.

edited Sep 01 '21 at 15:44

answered Sep 01 '21 at 14:18

lbrissot

44
4

Thanks! I guess I'm a little confused, since the assert functions make it seem like the mean should be less than 0.5 away from 45. The assert tests were given by the instructors of this assignment and cannot be edited. – Hefe Sep 01 '21 at 14:22
The assert function here is returning an error when the absolute value of the difference between pop_mean and 45 is greater than 0.5 You're making sure that the mean of the random sample is not too different of what you can expect. If you reduce the size of the sample (5000) then you also should increase the margin of error (0.5). – lbrissot Sep 01 '21 at 14:27
Got it. I appreciate the explanation a lot, but I'm honestly still confused on how to change my code to pass the assert test! – Hefe Sep 01 '21 at 14:44
Apparently I need to use a Poisson distribution – Hefe Sep 01 '21 at 15:26
I edited the answer to give you an example with Poisson. – lbrissot Sep 01 '21 at 15:44
How would you recommend sampling this hypothetical population, taking just a random sample of 30 individuals? Again a Poisson distribution? – Hefe Sep 01 '21 at 16:06
I tried np.random.choice(population, 30). – Hefe Sep 01 '21 at 16:14
1

You should note in your answer that many `numpy.random.*` functions (including `randint` and `poisson`) have become legacy functions now that NumPy 1.17 introduced a new PRNG system. However, such functions haven't been deprecated and will remain available for the time being due to backward compatibility. See also: https://stackoverflow.com/questions/67703875/np-random-binomial-vs-random-choices-for-simulating-coin-flips/67704191#67704191 – Peter O. Oct 16 '21 at 14:58

Create Population Data

1 Answers1