Inteligent loop over dictionnary python

Question

I have dictionnary that look like

position_dictionnary = {(0,0): <PositionObject1>, (0,1): <PositionObject2>, ...  (x,y): <PositionObjectn>}

I have a function that need to return me the list of the PositionObject in a specific range arround a given position. For now i have this piece of code that work fine for me :

def getSurrounding(center_position, range, position_dictionnary):
    for x,y in position_dictionnary:
         if abs(center_position.x - x) + abs(center_position.y - y) < range:
             yield position_dictionnary[(x,y)]

But when the dictionnary become big it become far too long, so i'm asking if there is a way to directly loop over the correct index of he dictionnary or another way that i don't see to make it run faster. (If the solution is not good practice proof but is faster it's ok)

Would a 2D list work better? I suppose it would work if x and y are always integers — Nathan, Jan 21 '20 at 14:53
Might use too much memory though, if the keys are very sparse. Probably it's worth taking a look at https://stackoverflow.com/questions/6179635/what-is-a-good-data-structure-for-storing-and-searching-2d-spatial-coordinates-i — GPhilo, Jan 21 '20 at 14:54
create an ordered dictionary based on the co-ordinates. binary search amongst the keys and the expand onto both directions with this range as condition. — Albin Paul, Jan 21 '20 at 14:57
@AlbinPaul What would you search search for amongst the keys? I was thinking of sorting the elements as well, but it isn't clear what you would search for. — AirSquid, Jan 21 '20 at 15:05
@JeffH Its gonna be a bit tricky to search. I was thinking to take a measure of `x - centerposition.x + y - centerposition.y` as a condition to search if the value is less that range the we can take one half else the other. And it if its satisfies the range we stop and expand both sides. — Albin Paul, Jan 21 '20 at 15:13
since your approach is O(n) I can't see a better solution, only if you have some details about x, y or your range (is negative, positive) etc. — kederrac, Jan 21 '20 at 17:39

r.ook · Answer 1 · 2020-01-21T15:12:02.630

1

First, don't overshadow the built-in keyword range. You'll lose access to the built-in object over the course of the same scope. Luckily for you, in this case you only shadowed it within the function scope. But if you did this on a global scope you will be asking for unnecessary trouble.

Now, onto your question. Instead of iterating through the entire dictionary, use range as your main condition to retrieve the keys that match your condition from the dictionary. It'll be significantly smaller. e.g.:

def getSurrounding(center_position, rng, position_dictionary):
    for x in range(-rng, rng):
        for y in range(-rng, rng):

            # First, check if the dictionary returns the coord
            position = position_dictionary.get((x, y))

            # If the position was retrieved, yield it back.
            if position:
                yield position

With this method, say you have a dictionary size of 50 x 50 coord (2500 total), and you want to check a range of 10 radius from your center position, you would only iterating 20 x 20 (400 total) times instead of 2500 times.

Of course, at a certain point, when your rng is larger than a radius of 25, you might want to just use your existing method instead.

edited Jan 21 '20 at 15:12

answered Jan 21 '20 at 15:06

r.ook

13,466
2
22
39

1

This assumes the positions are strictly integer, which isn't explicit, but may be accurate Also, if the field of objects is sparse and the range of search is large, this is could become quite inefficient. If the range is 10, you are going to perform 400 `gets` to perhaps find only one or 2 points, so there is some tradespace here based on parameters, I think – AirSquid Jan 21 '20 at 15:11
@JeffH exactly - I added my edit to point out the same. Eventually it reaches a point where the original approach will be more efficient. – r.ook Jan 21 '20 at 15:15
your approach is O(n^2) while OP solution is O(n) time complexity, just tested with 1000 items in `position_dictionnary`, got +1000 times slower for the worst-case scenario – kederrac Jan 21 '20 at 16:02
@rusu_ro1 I am aware of that, the benefits will be outpaced by the original solution at a certain threshold. That said, the `n` in discussion here is different though. an O(n) approach of `n=2500` would still be slower than an O(n^2) where `n=20`. As I've said, it depends on the use case, which I'm just *assuming* the `range` will be significantly smaller than the original set. – r.ook Jan 21 '20 at 16:25

score 1 · Accepted Answer · answered Jan 22 '20 at 03:01

You can do this in sub-linear time by double-sorting the x & y coordinates. This, of course, assumes that the dictionary of points is fairly static, because there is obviously a (small) cost for adding or moving things.

Below involves using several sorted containers, so each query to getSurrounding is going to invoke 4 bisection searches, which is still O(log(N)). The added benefit is that this should work for float values for locations, but I have not really thought that through.

from sortedcollections import SortedDict, SortedList

pos_dict = {(0,0): 'a',
            (1,2): 'b',
            (1,3): 'c',
            (2,2): 'd',
            (2,6): 'e',
            (3,2): 'f',
            (4,4): 'g',
            (5,5): 'h'}

sd = SortedDict()

# put everything into a sorted dictionary
for x, y in pos_dict:
    temp = sd.get(x, SortedList())
    temp.add(y)
    sd[x] = temp

# we look for candidate x values in the sorted dictionary 
# and be clever with the Manhattan
# distance to find the y values

rng = 2
ctr = (2,2)
points_in_range = []

left_x = ctr[0] - rng
right_x = ctr[0] + rng

low_y = ctr[1] - rng
hi_y = ctr[1] + rng

x_vals = sd.keys()[sd.bisect(left_x) : sd.bisect(right_x)]

# get y values within the remaining Manhattan distance and construct tuples from (x, y)
for x in x_vals:
    temp = sd.get(x)
    low_y_lim = low_y + abs(x-ctr[0])
    hi_y_lim = hi_y - abs(x-ctr[0])
    y_vals = temp[temp.bisect(low_y_lim) : temp.bisect(hi_y_lim)]
    points_in_range.extend([(x, y) for y in y_vals])

print(points_in_range)

for p in points_in_range:
    print(f'{pos_dict.get(p)} is in range at location {p}')

Yields:

[(1, 2), (1, 3), (2, 2), (3, 2)]
b is in range at location (1, 2)
c is in range at location (1, 3)
d is in range at location (2, 2)
f is in range at location (3, 2)

Other Notions...

Depending on size of dictionary and values of range, you might be able to get to near linear time. If the value of rng is small, then fabricating a set of points that are "in range" and intersecting it with a set of the keys in your dictionary is very easy and is near linear if rng is "small."

set_of_in_rng = {(x,y)      for x in range(ctr[0]-rng, ctr[0]+rng+1) 
                            for y in range(low_y+abs(x-2), hi_y-abs(x-2)+1)}

points_in_range = set_of_in_rng.intersection(set(pos_dict.keys()))

for p in points_in_range:
    print(f'{pos_dict.get(p)} is in range at location {p}')

Yields the same result:

b is in range at location (1, 2)
f is in range at location (3, 2)
c is in range at location (1, 3)
d is in range at location (2, 2)

Inteligent loop over dictionnary python

2 Answers2