How can I replace numbers in an array that fall above an upper bound or below a lower bound in python?

Question

I am trying to randomly generate numbers from a distribution. The numbers that do not fall within two standard deviations of the mean I want to replace, so that in the end all of the numbers in the array fall within this range. This is the code that I have so far:

mean = 150
COV = 0.4
sd = COV*mean
upper_limit = mean + 2*sd
lower_limit = mean - 2*sd
capacity = np.random.normal(mean, sd, size = (1,96))
for x in capacity:
    while x > upper_limit:
        x = np.random.normal(mean, sd, size = 1)
    while x < lower_limit:
        x = np.random.normal(mean, sd, size = 1)

However, I get the error message ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Can anyone help with how to fix this?

Is it that `capacity` is a matrix, so each `x` in `capacity` is a row, but you're treating it like an individual element? — Noah, Jan 07 '21 at 17:38
https://stackoverflow.com/questions/18441779/how-to-specify-upper-and-lower-limits-when-using-numpy-random-normal — Chris, Jan 07 '21 at 17:41

score 1 · Answer 1 · answered Jan 07 '21 at 17:40

1

I think you should change the size parameter from (1, 96) to 96. Because here your x has shape (96,) so is an array and thus not comparable to a single float value.

answered Jan 07 '21 at 17:40

Contestosis

369
1
4
19

just validate my answer then plz :) Glad it helped – Contestosis Jan 07 '21 at 17:52
I just realized that although changing the size allowed the code to run, it did not replace any values outside of two standard deviations away – Gigi Jan 07 '21 at 18:05

score 1 · Accepted Answer · answered Jan 07 '21 at 17:46

Don't iterate through a numpy array to do something on each element of the array. The whole point of using numpy is to make this faster by never iterating.

To check all values in capacity which are greater than upper_limit, just do this:

capacity > upper_limit

Then, you can get the indices of those items this way:

too_high_indices = np.where(capacity > upper_limit)

Then, you can generate a new random array to assign to all such, e.g.

capacity[too_high_indices] = np.random.normal(mean, sd, size=len(too_high_indices))

In the end, you do this:

too_high_indices = np.where(capacity > upper_limit)
while np.any(too_high_indices):
    capacity[too_high_indices] = np.random.normal(
        mean, sd, size=len(too_high_indices))
    too_high_indices = np.where(capacity > upper_limit)

Then repeat for the lower limit.

This way, it will be relatively fast even if the size grows.

score 0 · Answer 3 · answered Jan 07 '21 at 17:55

# print(capacity)
# changed = set([])
for i in range( len(capacity[0]) ):
    while capacity[0][i] > upper_limit or capacity[0][i] < lower_limit:
        capacity[0][i] = np.random.normal(mean, sd, size = 1)[0]
        # changed.add(i)
# print(capacity)
# print(changed)

How can I replace numbers in an array that fall above an upper bound or below a lower bound in python?

3 Answers3