0

Cell list is a data structure that maintains lists of data points in an N-D meshgrid. For example, the following list of 2d indices:

ind = [(0, 1), (1, 0), (0, 1), (0, 0), (0, 0), (0, 0), (1, 1)]

is converted to the following 2x2 cell list:

cell = [[[3, 4, 5], [0, 2]],
        [[1, ],     [6, ]]
       ]

using an O(n) algorithm:

# create an empty 2x2 cell list
cell = [[[] for _ in range(2)] for _ in range(2)]
for i in range(len(ind)):
    cell[ind[i][0], ind[i][1]].append(i)

Is there a vectorized way in numpy that can convert the list of indices (ind) into the cell structure described above?

1 Answers1

0

I don't think there is a good pure numpy but you can either use pythran or---if you don't want to touch a compiler---scipy.sparse cf. this Q&A which is essentially a 1D version of your problem.

[stb_pthr.py] simplified from Most efficient way to sort an array into bins specified by an index array?

import numpy as np

#pythran export sort_to_bins(int[:], int)

def sort_to_bins(idx, mx=-1):
    if mx==-1:
        mx = idx.max() + 1
    cnts = np.zeros(mx + 1, int)
    for i in range(idx.size):
        cnts[idx[i] + 1] += 1
    for i in range(1, cnts.size):
        cnts[i] += cnts[i-1]
    res = np.empty_like(idx)
    for i in range(idx.size):
        res[cnts[idx[i]]] = i
        cnts[idx[i]] += 1
    return res, cnts[:-1]

Compile: pythran stb_pthr.py

Main script:

import numpy as np
try:
    from stb_pthr import sort_to_bins
    HAVE_PYTHRAN = True
except:
    HAVE_PYTHRAN = False
from scipy import sparse

def fallback(flat, maxind):
    res = sparse.csr_matrix((np.zeros_like(flat),flat,np.arange(len(flat)+1)),
                            (len(flat), maxind)).tocsc()
    return res.indices, res.indptr[1:-1]

if not HAVE_PYTHRAN:
    sort_to_bins = fallback

def to_cell(data, shape=None):
    data = np.asanyarray(data)
    if shape is None:
        *shape, = (1 + c.max() for c in data.T)
    flat = np.ravel_multi_index((*data.T,), shape)
    reord, bnds = sort_to_bins(flat, np.prod(shape))
    return np.frompyfunc(np.split(reord, bnds).__getitem__, 1, 1)(
        np.arange(np.prod(shape)).reshape(shape))

ind = [(0, 1), (1, 0), (0, 1), (0, 0), (0, 0), (0, 0), (1, 1)]

print(to_cell(ind))

from timeit import timeit

ind = np.transpose(np.unravel_index(np.random.randint(0, 100, (1_000_000)), (10, 10)))

if HAVE_PYTHRAN:
    print(timeit(lambda: to_cell(ind), number=10)*100)
    sort_to_bins = fallback # !!! MUST REMOVE THIS LINE AFTER TESTING
print(timeit(lambda: to_cell(ind), number=10)*100)

Sample run, output is answer to OP's toy example and timings (in ms) for the pythran and scipy solutions on a 1,000,000 points example:

[[array([3, 4, 5]) array([0, 2])]
 [array([1]) array([6])]]
11.411489499732852
29.700406698975712
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99