1
def find_closest(data, target, key = lambda x:f(x))

This is my function definition where data is set of values, and I want to find the value that evaluates the closest to target in as few evaluations as possible, i.e. abs(target-f(x)) is minimum. f(x) is monotonic.

I've heard that binary search can do this in O(log(n)) time, is there a library implementation in python? Are there more efficient search algorithms?

EDIT: I'm looking to minimize complexity in terms of evaluating f(x) because that's the expensive part. I want to find the x in data that when evaluated with f(x), comes closest to the target. data is in the domain of f, target is in the range of f. Yes, data can be sorted quickly.

siamii
  • 23,374
  • 28
  • 93
  • 143

3 Answers3

0

You can use the utilities in the bisect module. You will have to evaluate x on data though, i.e. list(f(x) for x in data) to get a monotonic / sorted list to bisect.

I am not aware of a binary search in the standard library that works directly on f and data.

Alexander Gessler
  • 45,603
  • 7
  • 82
  • 122
0

If the data presented is already sorted and the function is strctly monotonic, apply the function f on the data and then perform a binary search using bisect.bisect

import bisect
def find_closest(data, target, key = f):

    data = map(f, data)
    if f(0) > f(1):
        data = [-e for e in data]
    try:
        return data[bisect.bisect_left(data, target)]
    except IndexError:
        return data[-1]
Abhijit
  • 62,056
  • 18
  • 131
  • 204
0

Use bisect_left() method to find lower bound. Bisect_left accepts a random-access list of elements, to avoid calculating all of them you can use lazy collection of calculated function values with __len__ and __getitem__ methods defined. Carefully check return value for border conditions. Your heavy calculation will be called O(log(N) + 1) = O(log(N)) times.

from bisect import bisect_left
from collections import defaultdict

class Cache(defaultdict):
    def __init__(self, method):
        self.method = method
    def __missing__(self, key):
        return self.method(key)

class MappedList(object):
    def __init__(self, method, input):
        self.method = method
        self.input = input
        self.cache = Cache(method)
    def __len__(self):
        return len(self.input)
    def __getitem__(self, i):
        return self.cache[input[i]]

def find_closest(data, target, key = lambda x:x):
    s = sorted(data)
    evaluated = MappedList(key, s)
    index = bisect_left(evaluated, target)
    if index == 0:
        return data[0]
    if index == len(data):
        return data[index-1]
    if target - evaluated[index-1] <= evaluated[index] - target:
        return data[index-1]
    else:
        return data[index]
Community
  • 1
  • 1
Basilevs
  • 22,440
  • 15
  • 57
  • 102
  • This looks good, didn't know that map was lazy. Here you're returning `f(x)`, right, instead of `x`. Just change it to `return data[index]`? – siamii Aug 26 '14 at 16:44
  • `imap` from your link doesn't support random access – siamii Aug 26 '14 at 17:14
  • [Check out mine](http://ideone.com/krSas4). Now with caching! :) Only two or three calculations for our 6-element input data. – Basilevs Aug 26 '14 at 17:39