1

Suppose I have the following numpy array:

arr = numpy.array([[1, 7], [2, 0], [2, 1], [2, 3], [3, 4], [3, 5], [5, 6]])

Suppose I choose a specific key, in this case 2. Then I would like the sorting to be the following:

arr_sorted = [[2, 0], [2, 1], [2, 3], [1, 7], [3, 4], [3, 5], [5, 6]]

The idea is first start with all the elements with key as 2, and then go to the entries whose keys were the values of the previous key.

Starting with 2, the entries are [2, 0], [2, 1], [2, 3]. The next keys after 2 will therefore be 0, 1, 3. There are no entries with 0 as key. There is one entry with 1 as the key: [1, 7]. There are two entries with 3 as the key: [3, 4], [3, 5]. The next unprocessed key is 7, but it has no entries. Same goes for 4. There is a single entry with 5 as the key: [5, 6]. 6 has no entries.

Is there any numpy or dictionary tricks to achieve this?

My nearest attempt is the following:

def bfs_finder(d, start):
  queue = deque([start])
  seen = [start]
  results = []
  while queue:
    _vertices = queue.popleft()
    current = [i for i, a in enumerate(d) if len([x for x in a if x in _vertices])==1 and i not in seen]
    curr1 = [a[1] for i, a in enumerate(d) if len([x for x in a if x in _vertices]) == 1 and i not in seen]
    if len(current)>0:
        results.extend(curr1)
        queue.extend(curr1)
        seen.extend(current)
  return results

But, I am actually getting an error of current = [i for i, a in enumerate(d) if len([x for x in a if x in _vertices])==1 and i not in seen] TypeError: argument of type 'int' is not iterable. Any suggestions on how to fix this error, as well as if there are any good improvements would be highly appreciated.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
konstant
  • 685
  • 1
  • 7
  • 19

1 Answers1

1

You could use a set to hold the processed keys and a deque or similar stack-compatible container to hold the processed values (a list would work too). Since you have numpy arrays, you could pre-sort the array by rows and extract swaths from it using np.searchsorted on the first column.

The algorithm would be something like this:

  1. Sort the array by rows
  2. Pre-allocate the output array
  3. Add the key to the stack
  4. While stack is not empty
    1. Pop a key off the stack
    2. If key is in the set, discard and continue
    3. Add key to the set
    4. Find key in first column (start and end indices) using bisection
    5. If key is present, copy swath to output array

The sort operation is O(n log(n)). The bisection algorithm will be as well (approximately n searches with O(log(n)) each). Your total algorithmic complexity should therefore not exceed O(n log(n)), which is pretty good for a weird sorting algorithm.

Sorting by rows seems to be fastest using np.argsort according to this answer, and fortunately np.searchsorted accepts a sorter argument.

Here is a sample implementation:

import numpy as np
from collections import deque

def bfs_finder(d, start):
    sorter = np.argsort(d[:, 0])
    done = set()
    todo = deque([start])
    output = np.empty_like(d)
    pos = 0
    while todo:
        key = todo.popleft()
        if key in done:
            continue
        done.add(key)
        left = np.searchsorted(d[:, 0], key, 'left', sorter)
        if left >= d.shape[0] or d[sorter[left], 0] != key:
            continue
        right = np.searchsorted(d[:, 0], key, 'right', sorter)
        next = pos + right - left
        output[pos:next, :] = d[sorter[left:right], :]
        todo.extend(output[pos:next, 1])
        pos = next
    return output

arr = np.array([[1, 7], [2, 0], [2, 1], [2, 3], [3, 4], [3, 5], [5, 6]])
print(bfs_finder(arr, 2))

IDEOne Link

[[2 0]
 [2 1]
 [2 3]
 [1 7]
 [3 4]
 [3 5]
 [5 6]]

This solution assumes that you won't have any keys in the original that are missing from the output. If you run into that problem, subtract the set of the first column from the set of processed keys, and decide how you want to handle the remainder.

You might be able to gain some additional mileage by passing the entire stack to each call of searchsorted instead of doing it one element at a time.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264