3

I have a list of logging entries of the form:

[{'time': 199920331000, 'message': 'message1'}, {'time': 199920331001, 'message': 'message2'}...]

where the value of time is always increasing through the list. If i want to get logs later than a given timestamp, I could walk the elements until I see a time stamp larger than the given timestamp:

def getLog(timestamp):
    global logs
    for x in range(len(logs)):
        if logs[x]['time'] > timestamp:
            return logs[x:]
    return []

I suppose there is already a quick search mechanism in python 3 but don't know where to look.

msw
  • 42,753
  • 9
  • 87
  • 112
Max
  • 7,957
  • 10
  • 33
  • 39

3 Answers3

4

If I understand you correctly, you are looking for the bisect module, which implements an efficient algorithm for finding the point where a value in a sorted list is larger or smaller than a given value.

Your log entries need to be a class that implements some form of ordering though. Something like this:

from functools import total_ordering

@total_ordering
class LogEntry(object):
    def __init__(self, time, message):
        self.time = time
        self.message = message

    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return self.time == other.time and self.message == other.message

    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        if self.time == other.time:
            return self.message < other.message
        return self.time < other.time

These LogEntry classes are orderable (with the help of the functools.total_ordering class decorator), and thus the bisect module knows what entries have a 'lower' value than other values.

Your function then becomes:

def getLog(timestamp):
    dummy_entry = LogEntry(timestamp, '')
    index = bisect.bisect_right(logs, dummy_entry)
    return logs[index:]

Note that we don't need to declare logs global as you are not assigning to it.

mgilson
  • 300,191
  • 65
  • 633
  • 696
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

Given that Python tries b.__gt__(a) when a.__lt__(b) is not implemented you don't need to change the log entries' class, it should be sufficient to provide a smart enough key:

import bisect
from functools import total_ordering
from operator import itemgetter

log = [
    {'time': 199920331000, 'message': 'message1'},
    {'time': 199920331001, 'message': 'message2'},
    # ...
]

@total_ordering
class Key(object):
    def __init__(self, keyfunc, keyval):
        self.keyfunc = keyfunc
        self.keyval = keyval

    def __eq__(self, other):
        return self.keyval == self.keyfunc(other)

    def __lt__(self, other):
        return self.keyval < self.keyfunc(other)

start = bisect.bisect(log, Key(itemgetter("time"), 199920331000))
print log[start:]

Alternatively you can wrap a view around the list of dicts:

def keyed(items, key):
    class View(object):
        def __getitem__(self, index):
            return key(items[index])
        def __len__(self):
            return len(items)
    return View()

start = bisect.bisect(keyed(log, itemgetter("time")), 199920331000)
print log[start:]

(This is stripped down from Smart way to delete tuples)

Community
  • 1
  • 1
1

If you know the time always increases, you can guarantee your list is sorted. Then I'd use the answer from here and try to adapt it, like this:

def binary_search(log_list, timestamp, lo=0, hi=None):
    if hi is None:
        hi = len(log_list)
    while lo < hi:
        mid = (lo+hi)//2
        midval = log_list[mid]['time']
        if midval < timestamp:
            lo = mid+1
        elif midval > timestamp: 
            hi = mid
        else:
            return mid
    return -1

(haven't tested though)

Community
  • 1
  • 1
E.Z.
  • 6,393
  • 11
  • 42
  • 69