Sorting the nodes of a tree in breadth-first order in numpy

Question

Suppose I have the following numpy array:

arr = numpy.array([[1, 7], [2, 0], [2, 1], [2, 3], [3, 4], [3, 5], [5, 6]])

Suppose I choose a specific key, in this case 2. Then I would like the sorting to be the following:

arr_sorted = [[2, 0], [2, 1], [2, 3], [1, 7], [3, 4], [3, 5], [5, 6]]

The idea is first start with all the elements with key as 2, and then go to the entries whose keys were the values of the previous key.

Starting with 2, the entries are [2, 0], [2, 1], [2, 3]. The next keys after 2 will therefore be 0, 1, 3. There are no entries with 0 as key. There is one entry with 1 as the key: [1, 7]. There are two entries with 3 as the key: [3, 4], [3, 5]. The next unprocessed key is 7, but it has no entries. Same goes for 4. There is a single entry with 5 as the key: [5, 6]. 6 has no entries.

Is there any numpy or dictionary tricks to achieve this?

My nearest attempt is the following:

def bfs_finder(d, start):
  queue = deque([start])
  seen = [start]
  results = []
  while queue:
    _vertices = queue.popleft()
    current = [i for i, a in enumerate(d) if len([x for x in a if x in _vertices])==1 and i not in seen]
    curr1 = [a[1] for i, a in enumerate(d) if len([x for x in a if x in _vertices]) == 1 and i not in seen]
    if len(current)>0:
        results.extend(curr1)
        queue.extend(curr1)
        seen.extend(current)
  return results

But, I am actually getting an error of current = [i for i, a in enumerate(d) if len([x for x in a if x in _vertices])==1 and i not in seen] TypeError: argument of type 'int' is not iterable. Any suggestions on how to fix this error, as well as if there are any good improvements would be highly appreciated.

Please do some additional explanation and/or give more examples of input and desired output. Also, please show the not so efficient way of doing the sorting, the one you have already found and implemented yourself. — Yunnosch, May 29 '18 at 17:51
@MadPhysicist 7 has the key 1, and one was the value of the previous key 2. — konstant, May 29 '18 at 17:59
@konstant. Thanks. So we basically step along, using the last unprocessed value to keep adding batches of keys. Gotcha — Mad Physicist, May 29 '18 at 18:00
@konstant. While this is an interesting problem, I have to ask if you have actually made any effort to solve it yourself, besides asking for a solution on SO. — Mad Physicist, May 29 '18 at 18:01
Ping me once you edit in your own attempt and an explanation of why it's not satisfactory. — Mad Physicist, May 29 '18 at 18:16
@MadPhysicist I do have an attempt. Unfortunately, there is an error in the code which I haven't been able to fix. But, I have posted my attempt regardless. — konstant, May 29 '18 at 18:46
`x for x in a`: `a` is an integer. Did you mean `range(a)` or something? — Mad Physicist, May 29 '18 at 19:57
What happens to remaining elements when you run out of processed values? — Mad Physicist, May 29 '18 at 20:05
I've changed the title of the question to better describe what you're actually asking - your data is a tree, of the form `[[parent_1, child_1], [parent_2, child_2], ...]` — Eric, May 29 '18 at 20:45
What determines the order of `[2, 0], [2, 1], [2, 3]` in your example? Are they left in the order they started in? Are they sorted by child id? — Eric, May 29 '18 at 20:46
Say you start with `[2, 1], [2, 0], [2, 3]`, what would be the result after sorting? — Tai, May 29 '18 at 20:54
@konstant Are you willing to accept an answer which returns `[[2, 0], [2, 1], [2, 3]]` for @Tai's example? — Mad Physicist, May 29 '18 at 21:10

Mad Physicist · Answer 1 · 2018-05-29T21:03:43.493

You could use a set to hold the processed keys and a deque or similar stack-compatible container to hold the processed values (a list would work too). Since you have numpy arrays, you could pre-sort the array by rows and extract swaths from it using np.searchsorted on the first column.

The algorithm would be something like this:

Sort the array by rows
Pre-allocate the output array
Add the key to the stack
While stack is not empty
1. Pop a key off the stack
2. If key is in the set, discard and continue
3. Add key to the set
4. Find key in first column (start and end indices) using bisection
5. If key is present, copy swath to output array

The sort operation is O(n log(n)). The bisection algorithm will be as well (approximately n searches with O(log(n)) each). Your total algorithmic complexity should therefore not exceed O(n log(n)), which is pretty good for a weird sorting algorithm.

Sorting by rows seems to be fastest using np.argsort according to this answer, and fortunately np.searchsorted accepts a sorter argument.

Here is a sample implementation:

import numpy as np
from collections import deque

def bfs_finder(d, start):
    sorter = np.argsort(d[:, 0])
    done = set()
    todo = deque([start])
    output = np.empty_like(d)
    pos = 0
    while todo:
        key = todo.popleft()
        if key in done:
            continue
        done.add(key)
        left = np.searchsorted(d[:, 0], key, 'left', sorter)
        if left >= d.shape[0] or d[sorter[left], 0] != key:
            continue
        right = np.searchsorted(d[:, 0], key, 'right', sorter)
        next = pos + right - left
        output[pos:next, :] = d[sorter[left:right], :]
        todo.extend(output[pos:next, 1])
        pos = next
    return output

arr = np.array([[1, 7], [2, 0], [2, 1], [2, 3], [3, 4], [3, 5], [5, 6]])
print(bfs_finder(arr, 2))

IDEOne Link

[[2 0]
 [2 1]
 [2 3]
 [1 7]
 [3 4]
 [3 5]
 [5 6]]

This solution assumes that you won't have any keys in the original that are missing from the output. If you run into that problem, subtract the set of the first column from the set of processed keys, and decide how you want to handle the remainder.

You might be able to gain some additional mileage by passing the entire stack to each call of searchsorted instead of doing it one element at a time.

Sorting the nodes of a tree in breadth-first order in numpy

1 Answers1