1

I have an index to choose elements from one array. But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. Is it possible?

index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr

what I get:

[ 1.  0.  0.  1.  0.  3.]

what I would like to get:

[ 1.  0.  0.  1.  0.  1.]

addendum

Actually I have a third array with the (vector) values to be inserted. So the problem is to insert values from values into arr at positions index as in the following. However I want to choose the values corresponding to minimum dist when multiple values have the same index.

index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr

I get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 6.  7.]]

I would like to get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 4.  5.]]
Emanuele Paolini
  • 9,912
  • 3
  • 38
  • 64
  • What do you mean by "closest" in your title? Do you want the smallest of the values at a given index, or something else? – askewchan Dec 07 '13 at 15:20
  • I wanted to say "smallest" but by some reason the title was considered invalid by stackoverflow :-( – Emanuele Paolini Dec 07 '13 at 17:33
  • How does `dist` affect the situation in the **addendum**? My interpretation: "for a repeated number in `index`, choose the row in `values` that corresponds with the smallest number in `dist`". Is that correct? If you want the "smallest" value in `values` for the repeated indices, then it's ambiguous (if say the two rows in `values` are `[0, 2]` and `[1, 1]`) – askewchan Dec 09 '13 at 15:36
  • Your interpretation is correct. I want to insert values from `dist` and choose the one whose `dist` is minimum if multiple values have the same `index` – Emanuele Paolini Dec 09 '13 at 19:43
  • You want to insert values from `values` I thought. My solution should do this for 2d `values`, by sorting with respect to `dist`. – askewchan Dec 09 '13 at 20:11

2 Answers2

1

Use groupby in pandas:

import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr
HYRY
  • 94,853
  • 25
  • 187
  • 187
1

If index is sorted, then itertools.groupby could be used to group that list.

np.array([(g[0],min([x[1] for x in g[1]])) for g in 
    itertools.groupby(zip(index,dist),lambda x:x[0])])

produces

array([[0, 1],
       [3, 1],
       [5, 1]])

This is about 8x slower than the version using np.unique. So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). For larger N the Pandas version is better. Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N.

hpaulj
  • 221,503
  • 14
  • 230
  • 353