Numpy fastest way to get indexes of neighbors with current value (flood fill)

Question

I need to find fast way to get indicies of neighbors with values like current

For example:

arr = [0, 0, 0, 1, 0, 1, 1, 1, 1, 0]

indicies = func(arr, 6)
# [5, 6, 7, 8]

6th element has value 1, so I need full slice containing 6th and all it's neighbors with same value

It is like a part of flood fill algorithm. Is there a way to do it fast in numpy? Is there a way for 2D array?

EDIT

Let's see some perfomance tests:

import numpy as np
import random

np.random.seed(1488)

arr = np.zeros(5000)
for x in np.random.randint(0, 5000, size = 100):
    arr[x:x+50] = 1

I will compare function from @Ehsan:

def func_Ehsan(arr, idx):
    change = np.insert(np.flatnonzero(np.diff(arr)), 0, -1)
    loc = np.searchsorted(change, idx)
    start = change[max(loc-1,0)]+1 if loc<len(change) else change[loc-1]
    end = change[min(loc, len(change)-1)]
    return (start, end)

change = np.insert(np.flatnonzero(np.diff(arr)), 0, -1)
def func_Ehsan_same_arr(arr, idx):
    loc = np.searchsorted(change, idx)
    start = change[max(loc-1,0)]+1 if loc<len(change) else change[loc-1]
    end = change[min(loc, len(change)-1)]
    return (start, end)

with my pure python function:

def my_func(arr, index):
    
    val = arr[index]
    size = arr.size
    
    end = index + 1
    while end < size and arr[end] == val:
        end += 1
    start = index - 1
    while start > -1 and arr[start] == val:
        start -= 1
        
    return start + 1, end

Take a look:

np.random.seed(1488)
%timeit my_func(arr, np.random.randint(0, 5000))
# 42.4 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


np.random.seed(1488)
%timeit func_Ehsan(arr, np.random.randint(0, 5000))
# 115 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

np.random.seed(1488)
%timeit func_Ehsan_same_arr(arr, np.random.randint(0, 5000))
# 18.1 µs ± 953 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is there a way to use same logic by numpy, without C module/Cython/Numba/python loops? And make it faster!

There is no fast way to do a 2D flood-fill. It's recursive and painful. The Wikipedia article has a great discussion: https://en.wikipedia.org/wiki/Flood_fill — Tim Roberts, Sep 24 '21 at 20:49
Does this answer your question? [Flood fill NumPy Array \`numpy.ndarray\`, i. e. assign new value to element and change neighboring elements of the same value, too](https://stackoverflow.com/questions/56314718/flood-fill-numpy-array-numpy-ndarray-i-e-assign-new-value-to-element-and-ch) — Jérôme Richard, Sep 24 '21 at 21:56
@TimRoberts I realized it with deques without any recursion, but it still takes much time for big images because of pure python realization. I'm trying to find low-level realizations what I can use instead of loops — Demetry Pascal, Sep 25 '21 at 06:27
@JérômeRichard thank u, but it's not so that and it has slow python realization — Demetry Pascal, Sep 25 '21 at 06:28
Could you please provide a 2D example? are you looking for constant value islands? — Ehsan, Sep 25 '21 at 06:52
@Ehsan yes, constant value islands, take a look at my edited question — Demetry Pascal, Sep 25 '21 at 09:12
@ДмитрийПасько Are the islands rectangle shape necessarily or any shape? — Ehsan, Sep 25 '21 at 22:30
@ДмитрийПасько Also, as far as performance goes, as I mentioned in my solution, it would be fast if you do multiple rounds on it. Now, you do not need to create array `change` every time, simply put it outside function and see how much faster the numpy solution is. Please see the edit on my post for it. — Ehsan, Sep 25 '21 at 22:38

score 1 · Answer 1 · answered Sep 24 '21 at 20:44

1

I don't know how to solve this problem with numpy but If you use pandas, you might get the result that you want with this:

import pandas as pd
df=pd.DataFrame(arr,columns=["data"])
df["new"] = df["data"].diff().ne(0).cumsum()
[{i[0]:j.index.tolist()} for i,j in df.groupby(["data","new"],sort=False)]

Output:

[{0: [0, 1, 2]}, {1: [3]}, {0: [4]}, {1: [5, 6, 7, 8]}, {0: [9]}]

answered Sep 24 '21 at 20:44

1

Using Panda for such a use seems not really well suited and this is not very efficient (especially the comprehension list). – Jérôme Richard Sep 24 '21 at 21:59

Ehsan · Answer 2 · 2021-09-25T22:54:43.340

Here is a numpy solution. I think you can improve it by a little more work:

def func(arr, idx):
    change = np.insert(np.flatnonzero(np.diff(arr)), 0, -1)
    loc = np.searchsorted(change, idx)
    start = change[max(loc-1,0)]+1 if loc<len(change) else change[loc-1]
    end = change[min(loc, len(change)-1)]
    return np.arange(start, end)

sample output:

indices = func(arr, 6)
#[5 6 7 8]

This would specially be faster if you have few changes in your arr (relative to array size) and you are looking for multiple of those index searches in the same array. Otherwise, faster solutions come to mind.

Performance comparison: If you are performing on same array multiple times, simply put first line out of function like this to avoid repetition.

change = np.insert(np.flatnonzero(np.diff(arr)), 0, -1)
def func(arr, idx):
        loc = np.searchsorted(change, idx)
        start = change[max(loc-1,0)]+1 if loc<len(change) else change[loc-1]
        end = change[min(loc, len(change)-1)]
        return np.arange(start, end)

For same input as OP's example:

np.random.seed(1488)
%timeit func_OP(arr, np.random.randint(0, 5000))
#23.5 µs ± 631 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

np.random.seed(1488)
%timeit func_Ehsan(arr, np.random.randint(0, 5000))
#7.89 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

np.random.seed(1488)
%timeit func_Jérôme_opt1(arr, np.random.randint(0, 5000))
#12.1 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit func_Jérôme_opt2(arr, np.random.randint(0, 5000))
#3.45 µs ± 179 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

With func_Ehsan being fastest (excluding Numba). Please mind that again, the performance of these functions vary significantly on number of changes in array, array size and how many times the function is being called on the same array. And of course Numba is faster than all (almost 2x faster than func_Ehsan. And if you are going to run it many times, build the groups in O(n) and use hash map to indices in O(1).

Jérôme Richard · Answer 3 · 2021-09-25T13:30:23.697

The main problem is that Numpy is not currently designed to solve this problem efficiently. A "find first index of value fast" or any similar lazy function call is required to solve this problem efficiently. However, while this feature as been discussed since 10 years ago, this feature is still no available in Numpy. See this post for more information. I do not expect any change soon. Until then, the best solution on relatively big array appear to use an iterative solution using relatively slow pure-Python loops and slow Numpy calls/accesses.

Beside this, one solution to speed up the computation is to work on small chunks. Here is an implementation:

def my_func_opt1(arr, index):
    val = arr[index]
    size = arr.size
    chunkSize = 128
    
    end = index + 1
    while end < size:
        chunk = arr[end:end+chunkSize]
        locations = (chunk != val).nonzero()[0]
        if len(locations) > 0:
            foundCount = locations[0]
            end += foundCount
            break
        else:
            end += len(chunk)
    start = index
    while start > 0:
        chunk = arr[max(start-chunkSize,0):start]
        locations = (chunk != val).nonzero()[0]
        if len(locations) > 0:
            foundCount = locations[-1]
            start -= chunkSize - 1 - foundCount
            break
        else:
            start -= len(chunk)
    
    return start, end

Here are performance results on my machine:

func_Ehsan:   53.8  µs ± 449 ns  per loop (mean ± std. dev. of 7 runs, 10000 loops each)
my_func:      17.5  µs ± 97.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
my_func_opt1:  7.31 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The thing is the result are a bit biased since np.random.randint takes actually 2.01 µs. Without this Numpy call included in the benchmark, here are the results:

func_Ehsan:   51.8  µs
my_func:      15.5  µs
my_func_opt1:  5.31 µs

As a result, my_func_opt1 is about 3 times faster than my_func. This is very difficult to write a faster code as any Numpy call introduces a relatively big overhead of 0.5-1.0 µs on my machine whatever the array size (eg. empty arrays) due to internal checks.

The following is useful for people interested in speeding up the operation and that can use Numba.

The simplest solution consist in using the Numba's JIT and more specifically just add decorator. This solution is also very fast.

@nb.njit('UniTuple(i8,2)(f8[::1], i8)')
def my_func_opt2(arr, index):
    val = arr[index]
    size = arr.size
    
    end = index + 1
    while end < size and arr[end] == val:
        end += 1
    start = index - 1
    while start > -1 and arr[start] == val:
        start -= 1
    
    return start + 1, end

On my machine my_func_opt2 takes only 0.63 µs (wit the random call excluded). As a result, my_func_opt2 is about 25 times faster than my_func. I highly doubt there is a faster solution since any Numpy calls on my machine take at least 0.5 µs and an empty Numba function takes 0.25 µs to call.

Beside this, please note that arr contains double-precision values which are pretty expensive to compute. It should be faster to use integers if you can. Also, please note that an array of 0 and 1 values can be stored in a int8 value which takes 8 times less memory and is often significantly faster to compute (due to CPU caches, the smaller the array the faster the computation). You can specify the type during the creation of the array: np.zeros(5000, dtype=np.int8)

Numpy fastest way to get indexes of neighbors with current value (flood fill)

3 Answers3