2

I have an array of zeros (17520,5) that I want to fill with two values: 0 and 0.05. I have two conditions, and I am using the function np.where, however, I only need to apply the second condition at specific indices of the array. The code I am using is as follows:

independent = np.zeros([17520,5])
w1 = np.where(independent == 0)
independent[w1] = np.random.choice([0.0, 0.05], size=len(w1[0]))

This part of the code works fine, and fills the zero array (independent) with the desired values: 0 and 0.05 with the same proportion (50/50). On the other hand, the second condition needs to be implemented only at specific indices, something as follows:

for n in range(0, 365):
    start = 24 + n*48
    end = 46 + n*48
    w2 = np.where(independent == 0.05)
    independent[w2][start:end,0:5]=np.random.choice([0.0, 0.05], (22,5),size=len(w2[0]))

Where [start:end,0:5] indicates the indices where I want to implement the conditions w2.

I would appreciate your help indicating the correct way to use the function np.where with indices, because at the moment I am having the following error

 SyntaxError: invalid syntax
yatu
  • 86,083
  • 12
  • 84
  • 139
Jonathan Budez
  • 226
  • 1
  • 3
  • 12
  • The problem is in `np.random.choice()`. Both `(22,5)` and `len(w2[0])` are inputs for the same variable `size`. – dome May 17 '19 at 09:45
  • I left only (22,5) and Im having the following error IndexError: too many indices for array – Jonathan Budez May 17 '19 at 09:56
  • Now the problem is in `independent[w2][start:end,0:5]`. You are giving too many indices, in the second square bracket only one between `start:end` and `0:5` should be there. Or you have to remove the first square bracket. – dome May 17 '19 at 10:00
  • independent[w2] is a copy, not a view – hpaulj May 17 '19 at 11:41

1 Answers1

1

Note that np.where can also take two array_like arguments from which to choose depending on the condition. Here's how you could use np.where in your case:

for n in range(0, 365):
    start = 24 + n*48
    end = 46 + n*48
    independent[start:end,0:5] = (np.where(independent== 0.05, 
                                          np.random.choice([0.0, 0.05], 
                                                    size=independent.shape), 
                                          independent)[start:end,0:5])

It's a little tricky but the above can be vectorized. The key is getting a list of ranges where we want independent to be updated. For that we can use n_ranges from the linked answer, which can be used to obtain a flat array with all ranges from the corresponding start and end:

start = 24 + np.arange(0, 365)*48
end = 46 + np.arange(0, 365)*48
ranges = n_ranges(start, end)
independent[ranges,0:5] = (np.where(independent== 0.05, 
                                   np.random.choice([0.0, 0.05], 
                                                    size=independent.shape), 
                                   independent)[ranges,0:5])

Checking the timings, we can see that with the second approach we gain over a 260x speedup!

def vect_approach(a):
    start = 24 + np.arange(0, 365)*48
    end = 46 + np.arange(0, 365)*48
    ranges = n_ranges(start, end)
    a[ranges,0:5] = (np.where(a== 0.05, 
                             np.random.choice([0.0, 0.05], size=a.shape ),
                             a)[ranges,0:5])

def loopy_approach(x):
    for n in range(0, 365):
        start = 24 + n*48
        end = 46 + n*48
        independent[start:end,0:5] = (np.where(independent== 0.05, 
                                              np.random.choice([0.0, 0.05], 
                                                        size=independent.shape), 
                                              independent)[start:end,0:5])

independent = np.zeros([17520,5])

%timeit loopy_approach(independent)
# 475 ms ± 19.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit vect_approach(independent)
# 1.87 ms ± 95.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yatu
  • 86,083
  • 12
  • 84
  • 139
  • thank you for your answer, the first solution works, however, when I try to use the second one, I've got the following error NameError: name 'n_ranges' is not defined – Jonathan Budez May 17 '19 at 10:49
  • Yes @JonathanBudez go to the link I've attached, `n_ranges` is an answer of mine in some other post. Use that function. It'll speed up your implementation :) – yatu May 17 '19 at 10:50