1

Useful information:

For information on how to sort a list of various data types see: How to sort (list/tuple) of lists/tuples?

.. and for information on how to perform a binary search on a sorted list see: Binary search (bisection) in Python

My question:

How can you neatly apply binary search (or another log(n) search algorithm) to a list of some data type, where the key is a inner-component of the data type itself? To keep the question simple we can use a list of tuples as an example:

x = [("a", 1), ("b",2), ("c",3)]
binary_search(x, "b") # search for "b", should return 1
# note how we are NOT searching for ("b",2) yet we want ("b",2) returned anyways

To simplify even further: we only need to return a single search result, not multiple if for example ("b",2) and ("b",3) both existed.

Better yet:

How can we modify the following simple code to perform the above operation?

from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):  # can't use a to specify default for hi
    hi = hi if hi is not None else len(a)  # hi defaults to len(a)   
    pos = bisect_left(a, x, lo, hi)  # find insertion position
    return (pos if pos != hi and a[pos] == x else -1)  # don't walk off the end

PLEASE NOTE: I am not looking for the complete algorithm itself. Rather, I am looking for the application of some of Python's standard(ish) libraries, and/or Python's other functionalities so that I can easily search a sorted list of some arbitrary data type at any time.

Thanks

Stephen Lasky
  • 417
  • 5
  • 18

2 Answers2

2

Take advantage of how lexicographic ordering deals with tuples of unequal length:

# bisect_right would also work
index = bisect.bisect_left(x, ('b',))

It may sometimes be convenient to feed a custom sequence type to bisect:

class KeyList(object):
    # bisect doesn't accept a key function, so we build the key into our sequence.
    def __init__(self, l, key):
        self.l = l
        self.key = key
    def __len__(self):
        return len(self.l)
    def __getitem__(self, index):
        return self.key(self.l[index])

import operator
# bisect_right would *not* work for this one.
index = bisect.bisect_left(KeyList(x, operator.itemgetter(0)), 'b')
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Modifying line 5 of the simple binary search algorithm to: pos = bisect_left(a, (x,), lo, hi) # find insertion position ... does not have the desired effect and returns a -1 not found. – Stephen Lasky Jul 18 '17 at 00:09
  • @StephenLasky: I'm just showing you how to find the index. Your `binary_search` function has other problems; for example, it's comparing `x` to `a[pos]` directly, so it doesn't understand that it's found the right entry. – user2357112 Jul 18 '17 at 00:12
  • My mistake, everything works perfectly. Out of curiosity: how could you modify the above property to search for say the Nth position? – Stephen Lasky Jul 18 '17 at 00:15
  • @StephenLasky: I'm not sure what you're talking about. – user2357112 Jul 18 '17 at 01:13
1

What about converting the list of tuples to a dict?

>>> d = dict([("a", 1), ("b",2), ("c",3)])
>>> d['b'] # 2
SuperShoot
  • 9,880
  • 2
  • 38
  • 55
  • The problem here is that I am dealing with massive lists (>1,000,000) and this kind of operation would simply be too slow. I appreciate your response though. – Stephen Lasky Jul 18 '17 at 00:03