Efficient function mapping with arguments in numpy

Question

I am trying to create a heightmap by interpolating between a bunch of heights at certain points in an area. To process the whole image, I have the following code snippet:


map_ = np.zeros((img_width, img_height))

for x in range(img_width):
    for y in range(img_height):
        map_[x, y] = calculate_height(set(points.items()), x, y)

This is calculate_height:

def distance(x1, y1, x2, y2) -> float:
    return np.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)


def calculate_height(points: set, x, y) -> float:
    total = 0
    dists = {}
    for pos, h in points:
        d = distance(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d = 1 / (d ** 2)
        dists[pos] = d
        total += d

    r = 0
    for pos, h in points:
        ratio = dists[pos] / total
        r += ratio * h

    r = r
    return r

This snippet works perfectly, but if the image is too big, it takes a long time to process, because this is O(n^2). The problem with this, is that a "too big" image is 800 600, and it takes almost a minute to process, which to me seems a bit excessive.

My goal is not to reduce the time complexity from O(n^2), but to reduce the time it takes to process images, so that I can get decently sized images in a reasonable amount of time.

I found this post, but I couldn't really try it out because it's all for a 1D array, and I have a 2D array, and I also need to pass the coordinates of each point and the set of existing points to the calculate_height function. What can I try to optimize this code snippet?

Edit: Moving set(points.items) out of the loop as @thethiny suggested was a HUGE improvement. I had no idea it was such a heavy thing to do. This makes it fast enough for me, but feel free to add more suggestions for the next people to come by!

Edit 2: I have further optimized this processing by including the following changes:

    # The first for loop inside the calculate_distance function
    for pos, h in points:
        d2 = distance2(pos[0], pos[1], x, y)
        if x == pos[0] and y == pos[1]:
            return h
        d2 = d2 ** -1 # 1 / (d ** 2) == d ** -2 == d2 ** -1
        dists[pos] = d2 # Having the square of d on these two lines makes no difference
        total += d2

This reduced execution time for a 200x200 image from 1.57 seconds to 0.76 seconds. The 800x600 image mentioned earlier now takes 6.13 seconds to process :D

This is what points looks like (as requested by @norok12):

# Hints added for easier understanding, I know this doesn't run
points: dict[tuple[int, int], float] = {
    (x: int, y: int): height: float,
    (x: int, y: int): height: float,
    (x: int, y: int): height: float
}
# The amount of points varies between datasets, so I can't provide a useful range other than [3, inf)

I'm having a hard time guessing what `calculate_height(set(points.items()), x, y)` does. Could you elaborate on that? You may be able to vectorize this whole thing and get your processin times down by a factor of 30 or more. — philosofool, Sep 14 '22 at 00:03
You can start by taking the `set(points.items())` outside of the loop since that's always the same code. Second you should take this to the Code Review website where the people there are more prone to answering optimization help. — thethiny, Sep 14 '22 at 00:03
Finally, I recommend you use numba.gvectorize which deals with `calculate_height` on different points since they don't clash with each other. Learn about `numba` and it'll save you lots of time. — thethiny, Sep 14 '22 at 00:03
@thethiny I have included the implementation of `calculate_height`. It's just 2D interpolation — Ciro García, Sep 14 '22 at 00:06
It's still the same concept, numba will do these calculations in parallel no matter what your code does. — thethiny, Sep 14 '22 at 00:07
Approximately how many points of interest are there in the set `points`? — AirSquid, Sep 14 '22 at 00:35
@AirSquid This is for generating heatmaps based on geographical data. The amount of points depends on the dataset, so I can't really provide a useful range. At least 4 or 5 I guess. For the time length mentioned in the question, there were only 7 points, so not that many — Ciro García, Sep 14 '22 at 00:43
I'm thinking that somebody who is more adept at `numpy` than I am could help you vectorize the heavy lifting of the math you have with the distance formula (which includes the *expensive* square root) and roll in the `1/d**2` as well. Right now, you are doing individual function calls there, but I think it is possible to vectorize that with numpy from the width * height * 2 matrix and the points * 2 matrix — AirSquid, Sep 14 '22 at 01:36
@norok2 `points` is just a dictionary that looks like this: `points = { (x: int, y: int): height: float, (x: int, y: int): height: float, (x: int, y: int): height: float }` I'll add it to the question for formatting — Ciro García, Sep 15 '22 at 05:30

score 1 · Answer 1 · answered Sep 14 '22 at 03:04

There's a few problems with your implementation.

Essentially what you're implementing is approximation using radial basis functions.

The usual algorithm for that looks like:

sum_w = 0
sum_wv = 0
for p,v in points.items():
   d = distance(p,x)
   w = 1.0 / (d*d)
   sum_w += w
   sum_wv += w*v
return sum_wv / sum_w

Your code has some extra logic for bailing out if p==x - which is good. But it also allocates an array of distances - which this single loop form does not need.

This brings execution of an example in a workbook from 13s to 12s.

The next thing to note is that collapsing the points dict into an numpy array gives us the chance to use numpy functions.

points_array = np.array([(p[0][0],p[0][1],p[1]) for p in points.items()]).astype(np.float32)

Then we can write the function as

def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This brings our time down to 658ms. But we can do better yet...

Since we're now using numpy functions we can apply numba.njit to JIT compile our function.

@numba.njit
def calculate_height_fast(points, x, y) -> float:
    dx = points[:,0] - x
    dy = points[:,1] - y
    r = np.hypot(dx,dy)
    w = 1.0 / (r*r)
    sum_w = np.sum(w)
    return np.sum(points[:,2] * w) / np.sum(w)

This was giving me 105ms (after the function had been run once to ensure it got compiled).

This is a speed up of 130x over the original implementation (for my data)

You can see the full implementations here

You can probably get even faster by removing the unused `sum_w = np.sum(w)` line and moving back to spelled out looping. — norok2, Sep 14 '22 at 09:58

score 0 · Answer 2 · answered Sep 14 '22 at 09:57

This really is a small addition to @MichaelAnderson's detailed answer.

Probably calculate_height_fast() can get faster by optimizing a bit more with explicit looping:

@numba.njit
def calculate_height_faster(points, x, y) -> float:
    dx = points[:, 0] - x
    dy = points[:, 1] - y
    r = np.hypot(dx, dy)
    # compute weighted average
    n = r.size
    sum_w = sum_wp = 0
    for i in range(n):
        w = 1.0 / (r[i] * r[i])
        sum_w += w
        sum_wp += points[i, 2] * w
    return sum_wp / sum_w

Another advantage of using explicit looping is that you can reintroduce the exiting early if `r==0`. — Michael Anderson, Sep 14 '22 at 10:07

Efficient function mapping with arguments in numpy

2 Answers2