0

I am trying to randomly generate numbers from a distribution. The numbers that do not fall within two standard deviations of the mean I want to replace, so that in the end all of the numbers in the array fall within this range. This is the code that I have so far:

mean = 150
COV = 0.4
sd = COV*mean
upper_limit = mean + 2*sd
lower_limit = mean - 2*sd
capacity = np.random.normal(mean, sd, size = (1,96))
for x in capacity:
    while x > upper_limit:
        x = np.random.normal(mean, sd, size = 1)
    while x < lower_limit:
        x = np.random.normal(mean, sd, size = 1)

However, I get the error message ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Can anyone help with how to fix this?

Mr. T
  • 11,960
  • 10
  • 32
  • 54
Gigi
  • 11
  • 2
  • 1
    Is it that `capacity` is a matrix, so each `x` in `capacity` is a row, but you're treating it like an individual element? – Noah Jan 07 '21 at 17:38
  • https://stackoverflow.com/questions/18441779/how-to-specify-upper-and-lower-limits-when-using-numpy-random-normal – Chris Jan 07 '21 at 17:41

3 Answers3

1

I think you should change the size parameter from (1, 96) to 96. Because here your x has shape (96,) so is an array and thus not comparable to a single float value.

Contestosis
  • 369
  • 1
  • 4
  • 19
1

Don't iterate through a numpy array to do something on each element of the array. The whole point of using numpy is to make this faster by never iterating.

To check all values in capacity which are greater than upper_limit, just do this:

capacity > upper_limit

Then, you can get the indices of those items this way:

too_high_indices = np.where(capacity > upper_limit)

Then, you can generate a new random array to assign to all such, e.g.

capacity[too_high_indices] = np.random.normal(mean, sd, size=len(too_high_indices))

In the end, you do this:

too_high_indices = np.where(capacity > upper_limit)
while np.any(too_high_indices):
    capacity[too_high_indices] = np.random.normal(
        mean, sd, size=len(too_high_indices))
    too_high_indices = np.where(capacity > upper_limit)

Then repeat for the lower limit.

This way, it will be relatively fast even if the size grows.

zvone
  • 18,045
  • 3
  • 49
  • 77
0
# print(capacity)
# changed = set([])
for i in range( len(capacity[0]) ):
    while capacity[0][i] > upper_limit or capacity[0][i] < lower_limit:
        capacity[0][i] = np.random.normal(mean, sd, size = 1)[0]
        # changed.add(i)
# print(capacity)
# print(changed)