1

I have data

A = np.array([1,2,3,4,5,6,7,8,9,10])
ind = np.array([0,1,4])
beg = 3

Typical size of A and ind is few millions.

What I want to do is modify data in A with index ind+beg.

for i in range(0,ind.size):
    A[ind[i]+beg] += 1

Since the operation on A (+1) is almost the same as adding beg to ind[i], I want to avoid this.

In C-code, I usually do this by using pointer.

int* ptA = A-beg;
for(int i=0; i<indsize; i++) ptA[ind[i]]++;

Is it possible to do in python in a similar way, or should I stick to the first code?

cs95
  • 379,657
  • 97
  • 704
  • 746
Dohyun
  • 642
  • 4
  • 15

2 Answers2

4

I think the equivalent of your C approach is : A[beg:][ind]+=1, it saves some additions. add.at is an unbuffered version, needed if ind have repeated values. it's generally slower.

A=arange(10010)
ind=np.unique(randint(0,10000,1000))
beg = 3

In [236]: %timeit for i in range(0,ind.size): A[ind[i]+beg] += 1
3.01 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [237]: %timeit A[beg+ind]+=1
39.8 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [238]: %timeit A[beg:][ind]+=1
33.3 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [239]: %timeit add.at(A[beg:],ind,1)
151 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    

Numba or Cython can speed this operation furthermore:

@numba.njit
def addat(A,beg,ind,amount):
    u=A[beg:]
    for i in ind:
        u[i]+=amount

In [249]: %timeit addat(A,beg,ind,1)
3.13 µs ± 688 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
B. M.
  • 18,243
  • 2
  • 35
  • 54
0

Numpy has powerful indexing features, which are documented here: https://docs.scipy.org/doc/numpy/user/basics.indexing.html

In your case you can do:

>>> A[ind+beg] += 1

This will add beg to each member of ind, then will index into A at those locations and increment.

mtrw
  • 34,200
  • 7
  • 63
  • 71