0

Imagine we have a tree of values stored in a NumPy array. For example -

In [1]: import numpy as np

In [2]: tree = np.array([[0, 6], [0, 4], [1, 3], [2, 9], [3, 1], [2, 7]]);

In [3]: tree.shape
Out[3]: (6, 2)

Each node in the tree is row in the array. The first row tree[0] is the root node [0, 6]. The first column tree[:,0] contains the row number of the node's parent and the second column tree[:,1] contains the node's value attribute.

What is most efficient way to access the value attributes of a given node up to root via its ancestors? For example, for the sixth node [2, 7], this would be [7, 3, 4, 6]

One method is to recursively read the array from the starting node up using the first column as an index for the next ancestor, for example -

In [20]: i = 5
    ...: values = []
    ...: while i > 0:
    ...:     values.append(tree[i, 1])
    ...:     i = tree[i, 0]
    ...: values.append(tree[0, 1])
    ...: print(values)
[7, 3, 4, 6]

but I found this to be slow for large complex trees. Is there a faster way?

Background to my question - I am trying to implement the Monte Carlo tree search (MCTS)

user2309803
  • 541
  • 5
  • 15
  • 1
    is [anytree](https://stackoverflow.com/a/39292920/20599896) an option? Sorry I could't profile, as the sample tree is rather short end cProfiler gives me 0.000 runtime fr both options. – Cpt.Hook Dec 05 '22 at 09:09
  • When exactly would it be too slow? Can you quantify this? What is the size and the height of the tree that results in a performance problem? And what is the time it takes for producing the `values` list in that case? – trincot Dec 05 '22 at 10:44
  • I suspect for this kind of data and iteration a list will be more efficient. – hpaulj Dec 05 '22 at 12:29

2 Answers2

2

For iterative operation like this, Numpy does not provide any (efficient) vectorization functions. A solution to speed this up is to use Numba (a JIT compiler) and return a Numpy array (since Numba can operate more efficiently on them). Here is an example:

import numba as nb
import numpy as np

@nb.njit(['(int16[:,:], int_)', '(int32[:,:], int_)', '(int64[:,:], int_)'])
def compute(tree, i):
    values = np.empty(max(tree.shape[0], 1), dtype=tree.dtype)
    cur = 0
    while i > 0:
        assert cur < values.size
        values[cur] = tree[i, 1]
        i = tree[i, 0]
        cur += 1
    assert cur < values.size
    values[cur] = tree[0, 1]
    return values[:cur+1] # Consider using copy() if cur << tree.shape[0]

print(compute(tree, 5))

It takes 0.76 us on my machine as opposed to 1.36 us for the initial code. However, ~0.54 us are spent in calling the JIT and checking the input parameter and 0.1~0.2 us are spent in the allocation of the output array. Thus, basically 90% of the time of the Numba function is a constant overhead. It should be much faster for large trees. If you have many small trees to compute, then you can call it from a Numba function so to avoid the overhead of calling a JIT function from the slow CPython interpreter. When called from a JIT function, the above function takes only 0.063 us on the input example. Thus, the Numba function can be up to 22 times faster in this case.

Note that it is better to use a small datatype for the tree since random accesses are expensive in large arrays. The smaller the array in memory, the more likely it can fit in CPU caches, the faster the computation. For trees with less than 65536 items, it is safe to use a uint16 datatype (while the default one is int32 on Windows and int64 on Linux, that is, respectively 2 and 4 times bigger).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
1

Not sure it could help. It depends on how often you do this operation, and also how big and how deep your tree is.

But basically, my suggestion, if you need to accelerate this, would be "precompute every thing for every node", just, then, you can do it numpy's style:

preList=[tree]
idx=tree[:,0]
while (preList[-1][:,0]!=0).any():
    preList.append(preList[-1][idx])
pre=np.stack(preList)

# Values of 6th node
pre[:,5][:,1]
# array([7, 3, 4, 6])

Note that it would always give 4 values, repeating root value if needed. But you can stop at the first pre[:,5][:,0] that is 0 (root).

This is just the same thing you are doing (from a row #i = [j,v], getting the parent row #j). Just done once for all on all nodes. To get a 3D matrix, whose 1st axis is the ancestor axis

Note that if your computation timing were unbearable with your current algorithm, meaning that your tree is very deep, then, chances are that mine would suffer from heavy memory usage, since my 3d matrix has the size of your 2d matrix times the tree depth.

For cpu usage tho, even if, from a strictly "number of operations point of view" what I do is just precompute your algorithm for all nodes in the tree, it probably worth it even if you don't need that much computation, because obviously, numpy array indexation is faster.

With a tree as small as yours, it takes 46 such request before my method is cheaper than yours (it takes 46 requests to absorb cost of precomputation). which is not good, considering that you have only 6 nodes.

But for a 13 nodes tree, precomputation time is 76μs, your code needs 3.12 μs/request, mine 350 ns. So number of request before it worth it drops to 27. Still more than number of nodes (13).

For a 27 nodes tree, precomputation time is 84μs, your code needs 3.81 μs/request, mine still 350 ns. So number of requests for which precomputation is profitable is 24.

In CPU time, precomputation is O(n.log(n)), your code is O(log(n)). And my request is O(1). So, in theory, if number of requests is k, that is O(n.log(n) + k) on my side, and O(klog(n)) on yours. Which becomes equivalent if k~n. As I said, it is just as calling your code on all possible nodes. But because of numpy efficiency, precomputation costs less that calling your code n times. So, it is worthy even if k is smaller than n.

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • You have declared `preList` as having a single element - the numpy array `tree` and then access last element `[-1]` which returns the original `tree`. Next you append 3 more arrays to the list, what do they represent? Does this pattern have a name? – user2309803 Dec 17 '22 at 15:12
  • I understand now that you append new arrays to `preList` which represent the parents, grandparents, etc. This is fine to get the values of a particular branch but could we update them without looping, i.e by vectorizing? – user2309803 Dec 17 '22 at 15:22
  • This is vectorization. Sure, I don't avoid to loop over ancestor (`preList` start with tree, that is list of nodes; after first iteration, it contains list of nodes, and list of their parents; after second iteration, list of nodes, and list of their parents, and list of their grand-parents; etc.). So, same as you were doing. Except that instead of doing it on just one node, I do it on all nodes at the same time. Which is precisely what vectorization is. – chrslg Dec 17 '22 at 23:56
  • @chrlg I think you process each ancestor level at the same time which is more efficient than looping over each node. I suppose it's not possible to process all ancestor levels up till root and the same time which was my original aim. Thanks for your time. – user2309803 Dec 18 '22 at 12:06
  • @user2309803 It may be possible. Some people are way more able than I am with numpy magic. But since none of them decided to answer to say "I can do better", that's an indication that it may also be impossible. – chrslg Dec 18 '22 at 12:53
  • @user2309803 Anyway, if you can use numba (and nowadays almost everybody can; it is available on most platform. And the exception, such as micropython, usually don't have numpy neither) it is almost always faster. – chrslg Dec 18 '22 at 12:58