using indices with multiple values, how to get the smallest one

Question

I have an index to choose elements from one array. But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. Is it possible?

index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr

what I get:

[ 1.  0.  0.  1.  0.  3.]

what I would like to get:

[ 1.  0.  0.  1.  0.  1.]

addendum

Actually I have a third array with the (vector) values to be inserted. So the problem is to insert values from values into arr at positions index as in the following. However I want to choose the values corresponding to minimum dist when multiple values have the same index.

index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr

I get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 6.  7.]]

I would like to get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 4.  5.]]

What do you mean by "closest" in your title? Do you want the smallest of the values at a given index, or something else? — askewchan, Dec 07 '13 at 15:20
I wanted to say "smallest" but by some reason the title was considered invalid by stackoverflow :-( — Emanuele Paolini, Dec 07 '13 at 17:33
How does `dist` affect the situation in the **addendum**? My interpretation: "for a repeated number in `index`, choose the row in `values` that corresponds with the smallest number in `dist`". Is that correct? If you want the "smallest" value in `values` for the repeated indices, then it's ambiguous (if say the two rows in `values` are `[0, 2]` and `[1, 1]`) — askewchan, Dec 09 '13 at 15:36
Your interpretation is correct. I want to insert values from `dist` and choose the one whose `dist` is minimum if multiple values have the same `index` — Emanuele Paolini, Dec 09 '13 at 19:43
You want to insert values from `values` I thought. My solution should do this for 2d `values`, by sorting with respect to `dist`. — askewchan, Dec 09 '13 at 20:11

score 1 · Accepted Answer · answered Dec 06 '13 at 12:37

1

Use groupby in pandas:

import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr

answered Dec 06 '13 at 12:37

HYRY

94,853
25
187
187

Actually I have a third array for values... can you adapt your answer to the *addendum* in my question? – Emanuele Paolini Dec 07 '13 at 17:53
Sorry, I don't known why my addendum was missing... now you should find it with sample data. – Emanuele Paolini Dec 09 '13 at 07:47
I got the *addendum* part solved by this question: http://stackoverflow.com/questions/19818756/extract-row-with-maximum-value-in-a-group-pandas-dataframe – Emanuele Paolini Dec 15 '13 at 07:43

hpaulj · Answer 2 · 2013-12-08T00:05:35.890

1

If index is sorted, then itertools.groupby could be used to group that list.

np.array([(g[0],min([x[1] for x in g[1]])) for g in 
    itertools.groupby(zip(index,dist),lambda x:x[0])])

produces

array([[0, 1],
       [3, 1],
       [5, 1]])

This is about 8x slower than the version using np.unique. So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). For larger N the Pandas version is better. Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N.

edited Dec 08 '13 at 00:05

answered Dec 07 '13 at 23:21

hpaulj

221,503
14
230
353

Yeah, I found that timing too, I think making a `Series` must be expensive. – askewchan Dec 08 '13 at 19:06

using indices with multiple values, how to get the smallest one

2 Answers2