(Python) Mapping between two arrays with a precedence array

Question

Given a source array

src = np.random.rand(320,240)

and an index array

idx = np.indices(src.shape).reshape(2, -1)
np.random.shuffle(idx.T)

we can map the linear index i in src to the 2-dimensional index idx[:,i] in a destination array dst via

dst = np.empty_like(src)
dst[tuple(idx)] = src.ravel()

This is discussed in Python: Mapping between two arrays with an index array

However, if this mapping is not 1-to-1, i.e., multiple entries in src map to the same entry in dst, according to the docs it is unspecified which of the source entries will be written to dst:

For advanced assignments, there is in general no guarantee for the iteration order. This means that if an element is set more than once, it is not possible to predict the final result.

If we are additionally given a precedence array

p = np.random.rand(*src.shape)

how can we use p to disambiguate this situation, i.e., write the entry with highest precedence according to p?

score 2 · Accepted Answer · answered May 07 '18 at 06:03

Here is a method using a sparse matrix for sorting (it has large overhead but scales better than argsort, presumably because it uses some radix sort like method (?)). Duplicate indices without precedence are explicitly set to -1. We make the destination array one cell too big, the surplus cell serving as trash can.

import numpy as np
from scipy import sparse

N = 2
idx = np.random.randint(0, N, (2, N, N))
prec = np.random.random((N, N))
src = np.arange(N*N).reshape(N, N)

def f_sparse(idx, prec, src):
    idx = np.ravel_multi_index(idx, src.shape).ravel()
    sp = sparse.csr_matrix((prec.ravel(), idx, np.arange(idx.size+1)),
                           (idx.size, idx.size)).tocsc()
    top = sp.indptr.argmax()
    mx = np.repeat(np.maximum.reduceat(sp.data, sp.indptr[:top]),
                   np.diff(sp.indptr[:top+1]))
    res = idx.copy()
    res[sp.indices[sp.data != mx]] = -1

    dst = np.full((idx.size + 1,), np.nan)
    dst[res] = src.ravel()
    return dst[:-1].reshape(src.shape)

print(idx)
print(prec)
print(src)
print(f_sparse(idx, prec, src))

Sample run:

[[[1 0]
  [1 0]]

 [[0 1]
  [0 0]]]
[[0.90995366 0.92095225]
 [0.60997092 0.84092015]]
[[0 1]
 [2 3]]
[[ 3.  1.]
 [ 0. nan]]

Thanks! I haven't fully understood the code yet, but if it uses sorting, why doesn't it suffer from the same (wrong) left-to-right evaluation assumption as unutbu's answer? — ASML, May 07 '18 at 15:06
@ASML It sorts by index, not by precedence; the purpose of the sorting is to have the clashing indices right next to each other, so they easily can be dealt with as a group (using `ufunc.reduceat`). As I wrote in the answer, duplicate indices which are not of maximal precedence are explicitly overwritten with `-1`, so upon assignment their corresponding values go to the last cell in the destination array. Now, we simply overallocate one cell and discard it in the end. — Paul Panzer, May 07 '18 at 15:21

(Python) Mapping between two arrays with a precedence array

1 Answers1