5

Numpy's ufuncs have a reduceat method which runs them over contiguous partitions within an array. So instead of writing:

import numpy as np
a = np.array([4, 0, 6, 8, 0, 9, 8, 5, 4, 9])
split_at = [4, 5]
maxima = [max(subarray for subarray in np.split(a, split_at)]

I can write:

maxima = np.maximum.reduceat(a, np.hstack([0, split_at]))

Both will return the maximum values in slices a[0:4], a[4:5], a[5:10], being [8, 0, 9].

I would like a similar function to perform argmax, noting that I would only like a single maximum index in each partition: [3, 4, 5] with the above a and split_at (despite indices 5 and 9 both obtaining the maximum value in the last group), as would be returned by

np.hstack([0, split_at]) + [np.argmax(subarray) for subarray in np.split(a, split_at)]

I will post a possible solution below, but would like to see one that is vectorized without creating an index over groups.

joeln
  • 3,563
  • 25
  • 31

2 Answers2

1

This solution involves building an index over groups ([0, 0, 0, 0, 1, 2, 2, 2, 2, 2] in the above example).

group_lengths = np.diff(np.hstack([0, split_at, len(a)]))
n_groups = len(group_lengths)
index = np.repeat(np.arange(n_groups), group_lengths)

Then we can use:

maxima = np.maximum.reduceat(a, np.hstack([0, split_at]))
all_argmax = np.flatnonzero(np.repeat(maxima, group_lengths) == a)
result = np.empty(len(group_lengths), dtype='i')
result[index[all_argmax[::-1]]] = all_argmax[::-1]

To get [3, 4, 5] in result. The [::-1]s ensure that we get the first rather than the last argmax in each group.

This relies on the fact that the last index in fancy assignment determines the value assigned, which @seberg says one shouldn't rely on (and a safer alternative can be achieved with result = all_argmax[np.unique(index[all_argmax], return_index=True)[1]], which involves a sort over len(maxima) ~ n_groups elements).

Community
  • 1
  • 1
joeln
  • 3,563
  • 25
  • 31
  • 1
    This has algorithmic complexity similar to your `np.unique` solution, but doesn't involve the `index` array at all. Once you have `all_argmax` you can directly do: `all_argmax[np.searchsorted(all_argmax, np.hstack([0, split_at]))]`. – Jaime Mar 02 '14 at 08:10
  • Thanks, @Jaime, `searchsorted` always turns up in places I don't think of it. – joeln Mar 02 '14 at 21:52
  • 1
    @Jaime your proposed solution doesn't give me the same result as the `np.unique` implementation. Am I missing something? – Vlad Dec 19 '19 at 22:33
  • Yes, the code I wrote is obviously wrong: `all_argmax` is a non-sorted array, so you can't call `np.searchsorted` on it. I haven't run it, so bugs may remain, but I think there's a `np.nonzero()` call missing around the innermost `all_argmax`, i.e.: `all_argmax[np.searchsorted(np.nonzero(all_argmax), np.hstack([0, split_at]))]` – Jaime Jan 07 '20 at 10:14
1

Inspired by this question, ive added argmin/max functionality to the numpy_indexed package. Here is what the corresponding test looks like. Note that the keys may be in any order (and of any kind supported by npi):

def test_argmin():
    keys   = [2, 0, 0, 1, 1, 2, 2, 2, 2, 2]
    values = [4, 5, 6, 8, 0, 9, 8, 5, 4, 9]
    unique, amin = group_by(keys).argmin(values)
    npt.assert_equal(unique, [0, 1, 2])
    npt.assert_equal(amin,   [1, 4, 0])
Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42