0

In Python, what is the most efficient way to search in a list which is sorted according to an attribute? See below for a more precise question.

Example:

class Any (object):
    def __init__(self, attr_a, attr_b):
        self.attr_a = attr_a
        self.attr_b = attr_b

L = [Any(-3, 4), Any(-2, 1), Any(0, 2), Any(2, 1), Any(5, 6), Any(6, 3), Any(8, 2), Any(10, 1), Any(13, 5), Any(14, 3)]

L is sorted according to the attribute attr_a. All Any instances of the list L have a different attr_a value. What is the most efficient way to search for the value of attr_b of the object which attr_a equals x?

Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • The most efficient searching method for a sorted list is the bisection algorithm. See [This post](http://stackoverflow.com/questions/212358/binary-search-in-python) for an implementation in python (take a look at the one with the most votes). You will need to modify it to find the attr_a, then get attr_b after. – SethMMorton Aug 01 '13 at 17:11

2 Answers2

3

You'd adopt a binary search to find the right Any object for your attr_a value. The bisect module provides a starting point:

def bisect_left(a, x, lo=0, hi=None, key=None):
    if key is None: key = lambda v: v
    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if key(a[mid]) < x: lo = mid+1
        else: hi = mid
    return lo

The only thing I did was add a key function to the signature here. key takes a callable that returns the value against we are bisecting.

Now you can use bisection to find your Any index:

from operator import attrgetter

index = bisect_left(L, x, key=attrgetter('attr_a'))

This returns either the index of a matching Any, or the index of the next Any object whose attr_a value is higher than x. You may need to test and/or adjust the algorithm for those cases. For example, you could verify that attr_a does indeed match the desired value:

def find_by_x(L, x, key):
    index = bisect_left(L, x, key=key)
    if key(L[index]) != x:
        raise IndexError('{} not found'.format(x))
    return L[index]

Demo:

>>> from operator import attrgetter
>>> L = [Any(-3, 4), Any(-2, 1), Any(0, 2), Any(2, 1), Any(5, 6), Any(6, 3), Any(8, 2), Any(10, 1), Any(13, 5), Any(14, 3)]
>>> x = 6
>>> bisect_left(L, x, key=attrgetter('attr_a'))
5
>>> L[bisect_left(L, x, key=attrgetter('attr_a'))].attr_b
3
>>> find_by_x(L, x, key=attrgetter('attr_a')).attr_b
3
>>> x = 12
>>> find_by_x(L, x, key=attrgetter('attr_a')).attr_b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in find_by_x
IndexError: 12 not found
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • That's a really great solution! – Pawel Miech Aug 01 '13 at 17:23
  • Thanks a lot! great solution. I'm kinda affraid though that I could get the index (and so `attr_b`) of the `attr_a` value that follows `x`! In theory it shouldn't be a problem but we never know... How could I avoid that? How could I get an error message if no `Any` instance has an `attr_a` which equals `x`? – Remi.b Aug 01 '13 at 17:31
  • You'd have to verify that the object at that index has the expected value for `attr_a`. – Martijn Pieters Aug 01 '13 at 17:32
  • I've added a sample function that verifies first. – Martijn Pieters Aug 01 '13 at 17:35
  • Well, that's right I can just check that afterward. Thanks a lot for this great answer @MartijnPieters – Remi.b Aug 01 '13 at 17:37
0

You want to use module bisect and its bisect_left function. In your case you would need to extract list of the keys before using it, see section Other examples for detailed explanation. If this is not acceptable then you can implement your own version of binary search, it's a simple algorithm.

piokuc
  • 25,594
  • 11
  • 72
  • 102