1

I need to detect which spheres are connected to each other. If we have:

radii = np.array([2, 1, 1, 2, 2, 0.5])
poss = np.array([[7, 7, 7], [7.5, 8.5, 6], [0, 0, 0], [-1, -2, -1], [1, 1, 1], [2, 1, 3]])

enter image description here

I want a Boolean array (shape = (number of groups, number of spheres)) or array/lists of arrays/lists of indices that shows which of the spheres are connected. So, the expected results for this example must be something like:

Boolean_array = np.array([[1, 1, 0, 0, 0, 0], [0, 0, 1, 1, 1, 1]], dtype=bool)

object_array = np.array([[0, 1], [2, 3, 4, 5]])

I tried to find a solution with networkx (I'm not very familiar with it) and IDK if this library can help where we have spheres with different radii. I guess, returned ends_ind in my previous code can be helpful in this regard and I tried to use that as:

G = nx.Graph([*ends_ind])
L = [nx.node_connected_component(G, 0)]
for i in range(len(radii)):
    iter = 0
    for j in L:
        if i in j:
            iter += 1
    if iter == 0:
        L.append(nx.node_connected_component(G, i))

Which will not work. The error:

Traceback (most recent call last):
  File "C:/Users/Ali/Desktop/check_2.py", line 31, in <module>
    L.append(nx.node_connected_component(G, i))
  File "<class 'networkx.utils.decorators.argmap'> compilation 8", line 4, in argmap_node_connected_component_5
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\algorithms\components\connected.py", line 185, in node_connected_component
    return _plain_bfs(G, n)
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\algorithms\components\connected.py", line 199, in _plain_bfs
    nextlevel.update(G_adj[v])
  File "C:\Users\Ali\anaconda3\envs\PFC_FiPy\lib\site-packages\networkx\classes\coreviews.py", line 82, in __getitem__
    return AtlasView(self._atlas[name])
KeyError: 11

Since using my previous code with other libraries will be an inefficient code (if it can solve the issue), I am seeking for any libraries, e.g. networkx, or methods that can do it in a more efficient way, if possible.

What is the best way to get my expected results, particularly for large number of spheres (~100000).

Ali_Sh
  • 2,667
  • 3
  • 43
  • 66

2 Answers2

1

You're trying to utilize networkx too early, here. First, you should calculate the geometrical distances for each pair of spheres. A useful trick for this is:

xyz_distances = poss.reshape(6, 1, 3) - poss.reshape(1, 6, 3)
distances = np.linalg.norm(xyz_distances, axis=2)

This gets you a symmetric 6x6 array of Euclidean distances between the sphere centers. Now, we need to compare the maximum possible distances. This is just the sum of radii for each pair of spheres, once again a 6x6 array, which we can calculate as

maximum_distances = radii.reshape(6, 1) + radii.reshape(1, 6)

And now we can compare the two:

>>> connections = distances < maximum_distances
>>> connections
array([[ True,  True, False, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False,  True,  True,  True, False],
       [False, False,  True,  True, False, False],
       [False, False,  True, False,  True,  True],
       [False, False, False, False,  True,  True]])

Which translates to two groups, just like you wanted - and you can get your second expected array via

>>> G = nx.Graph(connections)
>>> list(nx.connected_components(G))
[{0, 1}, {2, 3, 4, 5}]

Note that this whole thing is going to scale as N^2 in the number of spheres, and you might need to optimize that somehow (say, via scipy.spatial.ckdtree).

Dominik Stańczak
  • 2,046
  • 15
  • 27
  • Thanks. I will check. which `cKDTree` modules you are mentioning? I think, they can be utilized for when just one specified distance or radius is of interest, not for when we have various distances (sum of each pairs). But, may replacing some modules of scipy (e.g. `cdist` or …) instead `np.linalg.norm` make it a little faster. – Ali_Sh Jun 08 '22 at 10:00
  • Yeah, you're right, I was going to suggest something like `tree = cKDTree(poss); tree.sparse_distance_matrix(tree)`, but I didn't remember that it's got a second required argument like you said. Come to think of it, it still may be helpful if you do `tree.sparse_distance_matrix(tree, 2*radii.max())`. Then you're getting rid of a lot of pairs than obviously cannot ever be connected and can refine down from that. It's an idea to build on. – Dominik Stańczak Jun 08 '22 at 10:11
  • 1
    There's also https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance_matrix.html which may help with the memory overhead via the `threshold` argument. – Dominik Stańczak Jun 08 '22 at 10:19
  • I could not find out how to use `tree.sparse_distance_matrix(tree, 2 * radii.max()).toarray()` for this example to get the same results. [Memory leaks can be happened on `cKDTree`](https://stackoverflow.com/a/71332351/13394817) that haven't `threshold`. In the first part `cdist(poss, poss)` will be a little faster. If `cKDTree` can be helpful, I would be grateful if you add it in your answer, too. But, now, your solution is enough if get the true results; I am checking it. – Ali_Sh Jun 08 '22 at 10:37
  • I have tested on 18000 spheres, using `np.linalg.norm` cause memory leak, so I recommend using `cdist` in this regard, or something similar. – Ali_Sh Jun 08 '22 at 11:09
  • 1
    Well, I missed [your previous post](https://stackoverflow.com/questions/71104627/how-could-i-speed-up-my-written-python-code-spheres-contact-detection-collision), and I feel a little silly now that I see see my toy solution for calculating the distances themselves is badly outclassed by what came up there! But I hope at least the adjacency matrix/networkx stuff was helpful. – Dominik Stańczak Jun 08 '22 at 13:03
  • Thanks a lot, your answer solved my problem. I tried to limits the calculation to just upper triangle of arrays to reduce memory and time consumption and it seems it is working correctly. Is there any point to be considered when we limits this as I mentioned, to be added and considered in my answer?? – Ali_Sh Jun 09 '22 at 18:15
  • 1
    No, I think you've done all that can be done there :) I wasn't aware that [`nx.Graph` is undirected](https://networkx.org/documentation/stable/reference/classes/index.html#which-graph-class-should-i-use), so the lower triangular part of the adjacency matrix is completely useless for our purposes. Glad you were able to trim that 50% runtime! :) – Dominik Stańczak Jun 09 '22 at 19:41
1

In one of my tests on 18000 spheres, NumPy's linalg leaked the memory, but SciPy's cdist was more memory efficient and worked [ref 1]. It seems the calculations can be limited to JUST triangle upper the diameter of the arrays, which can be more efficient in terms of memory usage and time consumption. Thanks to Dominik answer, we can do this process using Numba accelerator as parallelized no python mode:

import numpy as np
import numba as nb
from scipy.spatial.distance import cdist
import networkx as nx


def distances_linalg(radii, poss):
    xyz_distances = poss.reshape(radii.shape[0], 1, 3) - poss.reshape(1, radii.shape[0], 3)
    return radii.reshape(radii.shape[0], 1) + radii.reshape(1, radii.shape[0]), np.linalg.norm(xyz_distances, axis=2)


def distances_cdist(radii, poss):
    return radii.reshape(radii.shape[0], 1) + radii.reshape(1, radii.shape[0]), cdist(poss, poss)


@nb.njit("(Tuple([float64[:, ::1], float64[:, ::1]]))(float64[::1], float64[:, ::1])", parallel=True)
def distances_numba(radii, poss):
    radii_arr = np.zeros((radii.shape[0], radii.shape[0]), dtype=np.float64)
    poss_arr = np.zeros((poss.shape[0], poss.shape[0]), dtype=np.float64)
    for i in nb.prange(radii.shape[0] - 1):
        for j in range(i+1, radii.shape[0]):
            radii_arr[i, j] = radii[i] + radii[j]
            poss_arr[i, j] = ((poss[i, 0] - poss[j, 0]) ** 2 + (poss[i, 1] - poss[j, 1]) ** 2 + (poss[i, 2] - poss[j, 2]) ** 2) ** 0.5
    return radii_arr, poss_arr


def connected_spheres(radii, poss, method=distances_numba):
    maximum_distances, distances = method(radii, poss)
    connections = distances < maximum_distances
    G = nx.Graph(connections)
    return list(nx.connected_components(G))


#         numba radii                      cdist or linalg radii
# [[0.  3.  3.  4.  4.  2.5]            [[4.  3.  3.  4.  4.  2.5]
#  [0.  0.  2.  3.  3.  1.5]             [3.  2.  2.  3.  3.  1.5]
#  [0.  0.  0.  3.  3.  1.5]             [3.  2.  2.  3.  3.  1.5]
#  [0.  0.  0.  0.  4.  2.5]             [4.  3.  3.  4.  4.  2.5]
#  [0.  0.  0.  0.  0.  2.5]             [4.  3.  3.  4.  4.  2.5]
#  [0.  0.  0.  0.  0.  0. ]]            [2.5 1.5 1.5 2.5 2.5 1. ]]

#                                numba poss                                 
# [[ 0.          1.87082869 12.12435565 14.45683229 10.39230485  8.77496439]
#  [ 0.          0.         12.82575534 15.21512405 11.11305539  9.77241014]
#  [ 0.          0.          0.          2.44948974  1.73205081  3.74165739]
#  [ 0.          0.          0.          0.          4.12310563  5.83095189]
#  [ 0.          0.          0.          0.          0.          2.23606798]
#  [ 0.          0.          0.          0.          0.          0.        ]]

#                           cdist or linalg poss                                 
# [[ 0.          1.87082869 12.12435565 14.45683229 10.39230485  8.77496439]
#  [ 1.87082869  0.         12.82575534 15.21512405 11.11305539  9.77241014]
#  [12.12435565 12.82575534  0.          2.44948974  1.73205081  3.74165739]
#  [14.45683229 15.21512405  2.44948974  0.          4.12310563  5.83095189]
#  [10.39230485 11.11305539  1.73205081  4.12310563  0.          2.23606798]
#  [ 8.77496439  9.77241014  3.74165739  5.83095189  2.23606798  0.        ]]

Which in my test on 18000 spheres was at least 2 times faster than cdist. I think, numba will be very helpful to avoid memory leaks on large arrays comparing to cdist.

Solution 2:

We can write distances_numba based on an improved cdist code by numba. In this solution I tried to modify that code to adjust it just on the upper triangle of the arrays:

@nb.njit("float64[:, ::1](float64[:, ::1])", parallel=True)
def dot_triu(poss):
    assert poss.shape[1] == 3
    poss_T = poss.T
    dot = np.zeros((poss.shape[0], poss.shape[0]), dtype=poss.dtype)
    for i in nb.prange(poss.shape[0] - 1):
        for j in range(i + 1, poss.shape[0]):
            dot[i, j] = poss[i, 0] * poss_T[0, j] + poss[i, 1] * poss_T[1, j] + poss[i, 2] * poss_T[2, j]
    return dot


@nb.njit("float64[::1](float64[:, ::1])", parallel=True)
def poss_(poss):
    TMP_A = np.zeros(poss.shape[0], dtype=np.float64)
    for i in nb.prange(poss.shape[0]):
        for j in range(poss.shape[1]):
            TMP_A[i] += poss[i, j] ** 2
    return TMP_A


@nb.njit("(Tuple([float64[:, ::1], float64[:, ::1]]))(float64[::1], float64[:, ::1])", parallel=True)
def distances_numba(radii, poss):
    poss_arr = dot_triu(poss)
    TMP_A = poss_(poss)

    radii_arr = np.zeros((radii.shape[0], radii.shape[0]), dtype=np.float64)
    for i in nb.prange(poss.shape[0] - 1):
        for j in range(i + 1, poss.shape[0]):
            radii_arr[i, j] = radii[i] + radii[j]
            poss_arr[i, j] = (-2. * poss_arr[i, j] + TMP_A[i] + TMP_A[j]) ** 0.5

    return radii_arr, poss_arr
Ali_Sh
  • 2,667
  • 3
  • 43
  • 66