2

I'm trying to do histrogram using numpy array indexing (without explicit iteration over array). Just to check if it works as expected I did following test:

import numpy as np

arr  = np.zeros(10)
inds = np.array([1,2,3,1,3,5,3])
arr[inds] += 1.0
print(arr)

the result is

[ 0. 1. 1. 1. 0. 1. 0. 0. 0. 0.] instead of

[ 0. 2. 1. 3. 0. 1. 0. 0. 0. 0.].

(i.e. it omits indexes which appear multiple times in index array)

I'm not sure if there is some reason for this behavior (perhaps to make these operation order independent and therefore easier to paralellize).

Is there any other way how to do this in numpy ?

Prokop Hapala
  • 2,424
  • 2
  • 30
  • 59
  • What your script does is to add +1 to the arr indexes specified in inds i.e. at indexes (1,2,3,5) – Jalo Nov 24 '16 at 10:40

2 Answers2

0

When you have multiple assignments in your operation on a numpy array python leaves the assignment to the last one. And it's all for sake of logical matters. Which has mentioned in document as well:

a = np.arange(5)
a[[0,0,2]]+=1
a array([1, 1, 3, 3, 4])

Even though 0 occurs twice in the list of indices, the 0th element is >only incremented once. This is because Python requires a+=1 to be equivalent to a=a+1.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • You are not answering the OP question – Jalo Nov 24 '16 at 10:53
  • 1
    actually it is answering my question very well... I wanted to know not only how to make the histogram, but also in general how to use numpy indexing in future, and "why" is it like that. – Prokop Hapala Nov 24 '16 at 11:00
  • @ProkopHapala I suppose you wanted to know why it was not working. However, I think that the answer has to be focused on the main question. Further clarifications are of course welcome, but this info alone by itself does not make an answer valid, at my judgement – Jalo Nov 24 '16 at 11:03
  • `add.at` is designed to get around this buffering issue. – hpaulj Nov 24 '16 at 16:41
0

The OP script adds +1 only once to the arr indexes specified in inds i.e. at indexes (1,2,3,5)

A well fitted NumPy function for what you require is numpy.bincount(). As the result of this function will have the size = inds.max(), you will have to slice arr to specify which indexes will be added. If not, the shapes will not coincide.

import numpy as np

arr  = np.zeros(10)
inds = np.array([1,2,3,1,3,5,3])
values = np.bincount(inds)
print values
arr[:values.size]+= values
print(arr)

values will be:

[0 2 1 3 0 1]

and arr will take the form:

array([ 0., 2., 1., 3., 0., 1., 0., 0., 0., 0.])

Jalo
  • 1,131
  • 1
  • 12
  • 28