1

I'd like to compare each value x of an array with a rolling window of the n previous values. More precisely I'd like to see at which percentile this new value x would be, if we added it to the previous window:

import numpy as np
A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
print A
n = 4  # window width
for i in range(len(A)-n):
    W = A[i:i+n]
    x = A[i+n]
    q = sum(W <= x) * 1.0 / n
    print 'Value:', x, ' Window before this value:', W, ' Quantile:', q

[ 1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15. ]
Value: 28.5 Window before this value: [ 1. 4. 9. 28.] Quantile: 1.0
Value: 2.0 Window before this value: [ 4. 9. 28. 28.5] Quantile: 0.0
Value: 283.0 Window before this value: [ 9. 28. 28.5 2. ] Quantile: 1.0
Value: 3.2 Window before this value: [ 28. 28.5 2. 283. ] Quantile: 0.25
Value: 7.0 Window before this value: [ 28.5 2. 283. 3.2] Quantile: 0.5
Value: 15.0 Window before this value: [ 2. 283. 3.2 7. ] Quantile: 0.75

Question: What is the name of this computation? Is there a clever numpy way to compute this more efficiently on arrays of millions of items (with n that can be ~5000)?


Note: here is a simulation for 1M items and n=5000 but it would take ~ 2 hours:

import numpy as np
A = np.random.random(1000*1000)  # the following is not very interesting with a [0,1]
n = 5000                         # uniform random variable, but anyway...
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
    Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n
    if i % 100 == 0: 
        print "%.2f %% already done. " % (i * 100.0 / len(A))

print Q

Note: this is not similar to How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy?

Basj
  • 41,386
  • 99
  • 383
  • 673

5 Answers5

2

Your code is so slow because you're using Python's own sum() instead of numpy.sum() or numpy.array.sum(); Python's sum() has to convert all the raw values to Python objects before doing the calculations, which is really slow. Just by changing sum(...) to np.sum(...) or (...).sum(), the runtime drops to under 20 seconds.

Aleksi Torhamo
  • 6,452
  • 2
  • 34
  • 44
1

you can use np.lib.stride_tricks.as_strided as in the accepted answer of the question you linked. With the first example you give, it is pretty easy to understand:

A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
n=4
print (np.lib.stride_tricks.as_strided(A, shape=(A.size-n,n),
                                       strides=(A.itemsize,A.itemsize)))
# you get the A.size-n columns of the n rolling elements
array([[  1. ,   4. ,   9. ,  28. ,  28.5,   2. ],
       [  4. ,   9. ,  28. ,  28.5,   2. , 283. ],
       [  9. ,  28. ,  28.5,   2. , 283. ,   3.2],
       [ 28. ,  28.5,   2. , 283. ,   3.2,   7. ]])

Now to do the calculation, you can compare this array to A[n:], sum over the rows and divide by n:

print ((np.lib.stride_tricks.as_strided(A, shape=(n,A.size-n),
                                        strides=(A.itemsize,A.itemsize)) 
          <= A[n:]).sum(0)/(1.*n))
[1.   0.   1.   0.25 0.5  0.75] # same anwser

Now the problem is the size of you data (several M and n around 5000), not sure you can use directly this method. One way could be to chunk the data. Let's define a function

def compare_strides (arr, n):
   return (np.lib.stride_tricks.as_strided(arr, shape=(n,arr.size-n),
                                           strides=(arr.itemsize,arr.itemsize)) 
            <= arr[n:]).sum(0)

and do the chunk, with np.concatenate and don't forget to divide by n:

nb_chunk = 1000 #this number depends on the capacity of you computer, 
                # not sure how to optimize it
Q = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n) 
                    for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)

I can't do the 1M - 5000 test, but on a 5000 - 100, see the difference in timeit:

A = np.random.random(5000)
n = 100

%%timeit
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
    Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n

#1 loop, best of 3: 6.75 s per loop

%%timeit
nb_chunk = 100
Q1 = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n) 
                    for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)

#100 loops, best of 3: 7.84 ms per loop

#check for egality
print ((Q == Q1).all())
Out[33]: True

See the difference in time from 6750 ms to 7.84 ms. Hope it works on bigger data

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • Here [`np.lib.stride_tricks.as_strided(A, shape=(len(A)-5,6), strides=(A.itemsize,A.itemsize))`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.stride_tricks.as_strided.html) creates a view, as a 2D matrix, with all the consecutive windows. It gives the exact same result than [`skimage.util.shape.view_as_windows(A, (6,))`](http://scikit-image.org/docs/dev/api/skimage.util.html#skimage.util.view_as_windows). Do you know the precise and computational differences @Ben.T? – Basj Dec 07 '18 at 16:20
  • @Basj if you go see the code [source](https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L106) of the link you gave, actually, the method from `skimage` use `as_strided` from `numpy` at the end, so it makes sense that the result is the same. – Ben.T Dec 10 '18 at 14:42
1

Using np.sum instead of sum was already mentioned, so my only suggestion left is to additionally consider using pandas and its rolling window function, which you can apply any arbitrary function to:

import numpy as np
import pandas as pd

A = np.random.random(1000*1000)
df = pd.DataFrame(A)
n = 5000

def fct(x):
    return np.sum(x[:-1] <= x[-1]) * 1.0 / (len(x)-1)

percentiles = df.rolling(n+1).apply(fct)
print(percentiles)
Basj
  • 41,386
  • 99
  • 383
  • 673
SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • Great solution too! It takes ~ 25 seconds for me for 1M items and n=5000, like with AleksiTorhamo's answer. – Basj Nov 03 '18 at 22:44
  • Nice to know. However, on my cell phone it looks like this approach introduces a performance drawback against your loop. But perhaps this can be balanced by immediately having all results in a dataframe ready for further processing. – SpghttCd Nov 03 '18 at 22:54
1

Additional benchmark: comparison between this solution and this solution:

import numpy as np, time

A = np.random.random(1000*1000)
n = 5000

def compare_strides (arr, n):
   return (np.lib.stride_tricks.as_strided(arr, shape=(n,arr.size-n), strides=(arr.itemsize,arr.itemsize)) <= arr[n:]).sum(0)

# Test #1: with strides ===> 11.0 seconds
t0 = time.time()
nb_chunk = 10*1000
Q = np.concatenate([compare_strides(A[chunk*nb_chunk:(chunk+1)*nb_chunk+n],n) for chunk in range(0,A[n:].size/nb_chunk+1)])/(1.*n)
print time.time() - t0, Q

# Test #2: with just np.sum ===> 18.0 seconds
t0 = time.time()
Q2 = np.zeros(len(A)-n)
for i in range(len(Q2)):
    Q2[i] = np.sum(A[i:i+n] <= A[i+n])
Q2 *= 1.0 / n  # here the multiplication is vectorized; if instead, we move this multiplication to the previous line: np.sum(A[i:i+n] <= A[i+n]) * 1.0 / n, it is 6 seconds slower
print time.time() - t0, Q2

print all(Q == Q2)

There's also another (better) way, with numba and @jit decorator. Then it is much faster: only 5.4 seconds!

from numba import jit
import numpy as np

@jit  # if you remove this line, it is much slower (similar to Test #2 above)
def doit():
    A = np.random.random(1000*1000)
    n = 5000
    Q2 = np.zeros(len(A)-n)
    for i in range(len(Q2)):
        Q2[i] = np.sum(A[i:i+n] <= A[i+n])
    Q2 *= 1.0/n
    print(Q2)

doit()

When adding numba parallelization, it's even faster: 1.8 seconds!

import numpy as np
from numba import jit, prange

@jit(parallel=True)
def doit(A, Q, n):
    for i in prange(len(Q)):
        Q[i] = np.sum(A[i:i+n] <= A[i+n])

A = np.random.random(1000*1000)
n = 5000
Q = np.zeros(len(A)-n)    
doit(A, Q, n)
Basj
  • 41,386
  • 99
  • 383
  • 673
  • if you want to use `np.sum`, you can even improve the speed by doing this way `Q = np.array([np.less_equal(A[i:i+n],A[i+n]).sum() for i in range(len(A)-n)],dtype=float)/n` – Ben.T Nov 05 '18 at 19:40
  • Thanks @Ben.T, I tried it and it takes ~ 17 seconds for me, great! How do you explain the 30% speed improvement in comparison with the loop + np.sum version? – Basj Nov 05 '18 at 21:52
  • 1
    of what I know, using a list comprehension directly in the creation of an array is faster than creating an empty array and then have to access it at each loop to change the value (accessing a value in an array takes time). The other reason is dividing by n the whole array and not each value is faster as the division is vectorized. It also seems that imposing the `dtype=float` when creating the array was faster than converting an array of `int` to `float` (memory reason?). using `np.less_equal` or `<=` was not a significant improve for me – Ben.T Nov 05 '18 at 22:01
  • 1
    @Ben.T Thanks for this, I edited the answer. I also tried with numba, look at the result ;) – Basj Nov 06 '18 at 10:39
  • 1
    That is a pretty good improvement compare to your original solution :) – Ben.T Nov 06 '18 at 13:17
  • 1
    @Ben.T ... and another x3 improvement thanks to numba parallelization ;) – Basj Nov 06 '18 at 13:21
  • ha ha always faster :) just by curiosity as I'm not familiar with Numba, would it be possible (and interesting) to use it on the solution with strides? – Ben.T Nov 06 '18 at 15:51
-2

You can use the np.quantile instead of sum(A[i:i+n] <= A[i+n]) * 1.0 / n. That may be as good as it gets. Not sure if there really is a better approach for your question.

Ravi Patel
  • 346
  • 2
  • 8
  • Are you sure it would give the same thing? I don't think so. `np.quantile(W, 50)` would give the **value** x such that W < x for p=50% of the values in W. Here I'm looking for the opposite: knowing x, I'm looking for p. – Basj Nov 03 '18 at 21:43
  • Yep you're right, sorry! `quantile` is in fact not the inverse of `percentile`, which in my haste I assumed. They are almost the same. – Ravi Patel Nov 04 '18 at 02:32