1

I have a numpy array A and I want to modify values in it using a indexing list B. But the thing is in my slicing I can have an element of the array multiple times... This example will explain better what I mean by that :

import numpy as np
A = np.arange(5) + 0.5
B = np.array([0, 1, 0, 2, 0, 3, 0, 4])
print A[B]

returns as expected [ 0.5 1.5 0.5 2.5 0.5 3.5 0.5 4.5]. However if I do that :

A[B] += 1.
print A

I was expecting to obtain [ 4.5 2.5 3.5 4.5 5.5] as the first element is repeated 4 times in the indexing vector B, but it returns [ 1.5 2.5 3.5 4.5 5.5]. So how can I do what I actually wanted to do? (without using any loop as I'm using that on very large arrays)

Thomas Leonard
  • 1,047
  • 11
  • 25
  • Dont understand. Your expected result is `[ 4.5 2.5 3.5 4.5]` which has only 4 elements, and you have 5 indices in B. Why one element is missing? – Marcin Jan 29 '15 at 01:50
  • I run your example on simple A (i.e. no random) to have some reproducibily of results: http://pastebin.com/mm2CcmK7 For me it seems to work ok, if I understand you correctly. – Marcin Jan 29 '15 at 02:03
  • @Marcin A is not random in my case... it's arange(5) + 0.5 but anyway it works the same on your example and I was expecting to get `[15 23 34 45 56]` not `[12 23 34 45 56]` as the first element of A is repeated 4 times in the indexing list B, it should be submitted 4 times to the +1 operation... at least that's what I wanna do in the end. Any idea how? – Thomas Leonard Jan 29 '15 at 02:08
  • Ah ok. Now i understand what you mean. Than check Jamie's anwser. It seems to solve the problem. – Marcin Jan 29 '15 at 02:10

1 Answers1

2

The explanation why this happens is a little involved, but basically, "buffering ate your homework." There are a couple of ways around this issue of numpy ufuncs. The proper one, that will work with any operation is to use the corresponding ufunc's at method:

>>> A = np.arange(5) + 0.5
>>> B = np.array([0, 1, 0, 2, 0, 3, 0, 4])
>>> np.add.at(A, B, 1)
>>> A
array([ 4.5,  2.5,  3.5,  4.5,  5.5])

This tends to be kind of slow, so for the fastest performance possible, and only for addition, you can use np.bincount:

>>> A = np.arange(5) + 0.5
>>> A += np.bincount(B) * 1  # replace the 1 with the number you want to add
>>> A
array([ 4.5,  2.5,  3.5,  4.5,  5.5])

EDIT

If what you want to add is an array of the same length as B, then the following, using bincount, is probably going to run faster than the first method:

>>> A = np.arange(5) + 0.5
>>> C = np.ones_like(B)  # They are all ones, but could be anything
>>> A += np.bincount(B, weights=C)
>>> A
array([ 4.5,  2.5,  3.5,  4.5,  5.5])
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Thank you. That actually works perfectly but not exactly for what I wanted to do in the end... my example was to simple. If I have an other array C : `C = np.arange(8)`, what I wanna do (which is not working but so you can see the point) is something along those lines : `A[B]+=C`. So I'm not just adding a single value but an array of the same size as B. Am I clear or is it confusing? – Thomas Leonard Jan 29 '15 at 02:17
  • sorry, your first solution is actually doing the job... as I wanted something efficient I skipped directly to the second one. – Thomas Leonard Jan 29 '15 at 02:19
  • 1
    See my edit, it is likely faster than using `.at` for addition. – Jaime Jan 29 '15 at 04:09
  • One issue that I still have is the method using `bincount` is that the indexing array (B) must contain at least once each of the indices of A otherwise you get a shape error `ValueError: operands could not be broadcast together`. The `.at` method doesn't have this requirement. – Thomas Leonard Jan 29 '15 at 15:25
  • You can use `bincount`'s `minlength`argument to fix that, so something like `np.bincount(B, weights=C, minlength=len(A))` should do it. But as you are noticing it gets hacky: unless this is a bottleneck for you, the proper thing to do is to use `.at`: it will likely get faster in future releases of numpy, it is a very recent addition. – Jaime Jan 29 '15 at 17:58