0

Is anyone familiar with a faster way to increment spans of indexes, as done by the following patch(ar, ranges)? In other words, it counts overlapping ranges, somewhat like a histogram. Maybe there is a vectorized method that already does something similar?

import numpy as np
ar = np.zeros(10)
ranges = [(1, 2), (4, 7), (6, 8)]
add_this = 1

def patch(ar, ranges):
    for start, end in ranges:
        ar[start:end] += add_this
    return ar

print np.arange(10)*1.
print patch(ar, ranges)

Output:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[ 0.  1.  0.  0.  1.  1.  2.  1.  0.  0.]

My problem is similar to: - Pandas: Overlapping time count - How to identify the maximum number of overlapping date ranges?

Community
  • 1
  • 1
Yariv
  • 12,945
  • 19
  • 54
  • 75

1 Answers1

1

I think there is no such method in numpy, how every it's very easy to write a cython function to speedup the calculation:

Create the random range first:

import numpy as np

N = 1000
idx = np.random.randint(0, N, (100000, 2)).astype(np.uint64)
idx.sort(axis=1)
tidx = [tuple(x) for x in idx.tolist()]

Python for loop:

%%time
a = np.zeros(N)
for s,e in tidx:
    a[s:e] += 1

output:

CPU times: user 459 ms, sys: 1e+03 µs, total: 460 ms
Wall time: 461 ms

define cython function:

%%cython
import cython

@cython.wraparound(False)
@cython.boundscheck(False)
def patch(double[::1] a, size_t[:, ::1] idx, double v):
    cdef size_t i, j
    for i in range(idx.shape[0]):
        for j in range(idx[i, 0], idx[i, 1]):
            a[j] += v

calculate by cython:

%%time
a2 = np.zeros(N)
patch(a2, idx, 1)

output:

CPU times: user 18 ms, sys: 0 ns, total: 18 ms
Wall time: 17.7 ms

check results:

np.all(a == a2)

output:

True
HYRY
  • 94,853
  • 25
  • 187
  • 187