How to efficiently and incrementally argsort vectors in Python?

Question

Within a loop that collects some samples, I need to obtain some statistics about their sorted indices every now and then, for which argsort returns exactly what I need. However, each iteration adds only a single sample, and it is a huge waste of resources to keep passing the whole samples array to the argsort function, especially since the samples array is very huge. Is not there an incremental efficient technique equivalent to argsort?

I believe an efficient incremental argsort function can be implemented by maintaining an ordered list of samples, which can be searched for the proper insertion indices once a new sample arrives. Such indices can be then utilized to both maintain the order of the samples list as well as to generate the incremental argsort-like desired output. So far, I have utilized the searchsorted2d function by @Divakar, with slight modifications to obtain the insertion indices, and built some routine that can get the desired output if it is called after each sample insertion (b = 1). Yet, this is inefficient, and I would like to call the routine after the collection of kth samples (e.g. b = 10). In the case of bulk insertions, searchsorted2d seems to return incorrect indices, and that is were I stopped!

import time
import numpy as np

# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
    m, n = a.shape
    max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
    r = max_num * np.arange(m)[:,np.newaxis]
    p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
    return p #- n * (np.arange(m)[:,np.newaxis])

# The following works with batch size b = 1,
# but that is not efficient ...
# Can we make it work for any b > 0 value?
class incremental(object):
    def __init__(self, shape):
        # Express each row offset
        self.ranks_offset = np.tile(np.arange(shape[1]).reshape(1, -1),
                                    (shape[0], 1))
        # Storage for sorted samples
        self.a_sorted = np.empty((shape[0], 0))
        # Storage for sort indices
        self.a_ranks = np.empty((shape[0], 0), np.int)

    def argsort(self, a):
        if self.a_sorted.shape[1] == 0: # Use np.argsort for initialization
            self.a_ranks = a.argsort(axis=1)
            self.a_sorted = np.take_along_axis(a, self.a_ranks, 1)
        else: # In later itterations,
            # searchsorted the input increment
            indices = searchsorted2d(self.a_sorted, a)
            # insert the stack pos to track the sorting indices
            self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
                                     self.ranks_offset.ravel() +
                                     self.a_ranks.shape[1]).reshape((n, -1))
            # insert the increments to maintain a sorted input array
            self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
                                      a.ravel()).reshape((n, -1))
        return self.a_ranks

M = 1000 # number of iterations
n = 16   # vector size
b = 10   # vectors batch size

# Storage for samples
samples = np.zeros((n, M)) * np.nan

# The proposed approach
inc = incremental((n, b))

c = 0 # iterations counter
tick = time.time()
while c < M:
    if c % b == 0: # Perform batch computations
        #sample_ranks = samples[:,:c].argsort(axis=1)
        sample_ranks = inc.argsort(samples[:,max(0,c-b):c]) # Incremental argsort

        ######################################################
        # Utilize sample_ranks in some magic statistics here #
        ######################################################

    samples[:,c] = np.random.rand(n) # collect a sample
    c += 1 # increment the counter
tock = time.time()

last = ((c-1) // b) * b
sample_ranks_GT = samples[:,:last].argsort(axis=1) # Ground truth
print('Compatibility: {0:.1f}%'.format(
      100 * np.count_nonzero(sample_ranks == sample_ranks_GT) / sample_ranks.size))
print('Elapsed time: {0:.1f}ms'.format(
      (tock - tick) * 1000))

I would expect 100% compatibility with the argsort function, yet it need to be more efficient than calling argsort. As for execution time with an incremental approach, it seems that 15ms or so should be more than enough for the given example. So far, only one condition of these two can be met with any of the explored techniques.

To make a long story short, the shown above algorithm seems to be a variant of an order-statistic tree to estimate the data ranks, but it fails to do so when samples are added in bulk (b > 1). So far, it only works when inserting samples one by one (b = 1). However, the arrays are copied every time insert is called, which causes a huge overhead and forms a bottleneck, therefore samples shall be added in bulks rather than individually.

Can you introduce more efficient incremental argsort algorithm, or at least figure out how to support bulk insertion (b > 1) in the above one?

If you choose to start from where I stopped, then the problem can be reduced to fixing the bug in following snapshot:

import numpy as np

# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
    m, n = a.shape
    max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
    r = max_num * np.arange(m)[:,np.newaxis]
    p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
    # It seems the bug is around here...
    #return p - b.shape[0] * np.arange(b.shape[1])[np.newaxis]
    #return p - b.shape[1] * np.arange(b.shape[0])[:,np.newaxis]
    return p

n = 16  # vector size
b = 2   # vectors batch size

a = np.random.rand(n, 1) # Samples array
a_ranks = a.argsort(axis=1) # Initial ranks
a_sorted = np.take_along_axis(a, a_ranks, 1) # Initial sorted array

new_data = np.random.rand(n, b) # New block to append into the samples array
a = np.hstack((a, new_data)) #Append new block

indices = searchsorted2d(a_sorted, new_data) # Compute insertion indices
ranks_offset = np.tile(np.arange(b).reshape(1, -1), (a_ranks.shape[0], 1)) + a_ranks.shape[1] # Ranks to insert
a_ranks = np.insert(a_ranks, indices.ravel(), ranks_offset.ravel()).reshape((n, -1)) # Insert ransk according to their indices
a_ransk_GT = a.argsort(axis=1) # Ranks ground truth

mask = (a_ranks == a_ransk_GT)
print(mask) #Why they are not all True?
assert(np.all(mask)), 'Oops!' #This should not fail, but it does :(

It seems the bulk insertion is more involved that what I initially thought, and that searchsorted2d is not to be blamed. Take the case of a sorted array a = [ 1, 2, 5 ], and two new elements block = [3, 4] to be inserted. If we iterate and insert, then np.searchsorted(a, block[i]) would return [2] and [3], and that is just OK. However, if we call np.searchsorted(a, block) (the desired behavior - equivalent to iteration without insertion), we would get [2, 2]. This is problematic for implementing an incremental argsort, since even np.searchsorted(a, block[::-1]) would result in the same. Any idea?

With your actual use-case, which steps are the bottleneck(s) (if you had profiled)? — Divakar, May 19 '19 at 16:56
This is the standard problem solved by the [order-statistic tree](https://en.m.wikipedia.org/wiki/Order_statistic_tree). — Davis Herring, May 19 '19 at 17:40
@Divakar `insert` is the bottleneck for the `b = 1` case, followed by `searchsorted2d`. For any `b > 1`, the result is just incorrect! — Hamdi, May 19 '19 at 18:05
@DavisHerring I cannot find a Python implementation for the given technique. I would appreciate it if you can point out such an implementation. — Hamdi, May 19 '19 at 18:11
@Sahloul: It’s off-topic to ask about existing implementations; “write an order-statistic tree” is a sufficient treatment here. — Davis Herring, May 19 '19 at 18:43
`np.insert` is a general purpose function written in Python (not compiled); so it's not particularly fast. You can probably create your own new array with appropriate copies faster (especially if you are only adding one item at a time). — hpaulj, May 19 '19 at 19:51
@hpaulj Thanks for your reply, but in that case I would need to create copies of a huge array just to insert a single row/column. The question is: why `b > 1` cases fail? If I can manage to make it work with, say `b = 10`, then I guess calling `insert` would not be of a big problem to insert ten rows/columns at each call. — Hamdi, May 20 '19 at 06:49
@hpaulj I know that. I am referring to the need for internally copying the data every time a new entry is inserted. Thus, inserting entries in blocks is more efficient than inserting them one by one. However, the code I am showing cannot handle bulk insertion properly (`b > 1`). If you find its bug, then that is an answer. — Hamdi, May 20 '19 at 08:25
@Divakar I wonder why your `searchsorted2d` needed the `- n * (np.arange(m)[:,np.newaxis])` part. It just does not work for me! Also, why when inserting more than a single column, the returned indices are incorrect? If this is solved, then the case with `b > 1` shall work, and that is a perfect answer to this question. — Hamdi, May 20 '19 at 09:21
@Sahloul Can you implement it with a np.searchsorted in a loop? So, that we could see what's the functionality you are looking for? — Divakar, May 20 '19 at 13:35
@Divakar I just realized that the problem is not in your function, but rather in the returned indices. I updated the problem description accordingly. Please check the last sentence. — Hamdi, May 20 '19 at 14:11

Hamdi · Accepted Answer · 2019-05-20T21:14:30.193

It turned out that the returned indices by searchsorted are not enough to ensure a sorted array when dealing with batch inputs. If the being inserted block contains two entries that are out of order, yet they will end up being placed adjacently in the target array, then they will receive the exact same insertion index, thus get inserted in their current order, causing the glitch. Accordingly, the input block itself needs to be sorted before insertion. See the last paragraph of the question for a numerical example.

By sorting the input block and adapting the remaining parts, a 100.0% compatible solution with argsort is obtained, and it is very efficient (elapsed time is 15.6ms for inserting 1000 entries in ten by ten blocks b = 10). This can be reproduced by replacing the buggy incremental class found in the question with the following one:

# by Hamdi Sahloul
class incremental(object):
    def __init__(self, shape):
        # Storage for sorted samples
        self.a_sorted = np.empty((shape[0], 0))
        # Storage for sort indices
        self.a_ranks = np.empty((shape[0], 0), np.int)

    def argsort(self, block):
        # Compute the ranks of the input block
        block_ranks = block.argsort(axis=1)
        # Sort the block accordingly
        block_sorted = np.take_along_axis(block, block_ranks, 1)
        if self.a_sorted.shape[1] == 0: # Initalize using the block data
            self.a_ranks = block_ranks
            self.a_sorted = block_sorted
        else: # In later itterations,
            # searchsorted the input block
            indices = searchsorted2d(self.a_sorted, block_sorted)
            # update the global ranks
            self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
                                     block_ranks.ravel() +
                                     self.a_ranks.shape[1]).reshape((block.shape[0], -1))
            # update the overall sorted array
            self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
                                      block_sorted.ravel()).reshape((block.shape[0], -1))
        return self.a_ranks

How to efficiently and incrementally argsort vectors in Python?

1 Answers1