0
L = [('key1', 14), ('key2', 20), ('key3', 13), ('key4', 10), ('key5', 11)]

Assume I got a list like the above. The the second element of each tuple conceptually represents the size of a block memory. These block memories are adjacent and sequential. Their relationship could be represented as :

0____14____14+20____14+20+13____14+20+13+10____14+20+13+10+11

Now the user will give me the 'absolute address' and I should return a key to him to decrypt the whole memory block. For instance, if he wants to access the address '10', then I should return key1 for him. If he wants to access '15', and then I should return key2.

My implementation now has to search from the beginning every time. In reality, there are many memory blocks and users want to access all the time. The performance is bad.

L = [('key1', 14), ('key2', 20), ('key3', 13), ('key4', 10), ('key5', 11)]
def get_key(address):
    c = 0
    offset = 0
    for i in L:
        if address < i[1] + offset:
            break
        c += 1
        offset += i[1]
    return c

print(get_key(15))

How could I improve the performance?

The data structure doesn't have to be a list(L). But the known things should be (1) the block size, (2) the key, (3)user will access the 'absolute address'.

As per Burhan Khalid's instruction, the final code is as follows (see also How to find the cumulative sum of numbers in a list?):

from bisect import bisect
from itertools import accumulate

L = [('key1', 14), ('key2', 20), ('key3', 13), ('key4', 10), ('key5', 11)]
markers = [i[0] for i in L]
boundaries = list(accumulate([i[1] for i in L]))

##offsets = []
##offsets.append(boundaries[0])
##for offset in boundaries[1:]:
##   offsets.append(sum(offsets)+offset)

def buckets(value, boundaries, markers):
   i = bisect(boundaries, value)
   return markers[i]

print(buckets(67, boundaries, markers))
Community
  • 1
  • 1
minion
  • 561
  • 4
  • 17
  • `if he wants to access '10', then I should return key1 for him`. I don't get that part. can you explain it a bit more clear please? For the `10` shouldn't be the `key4` ? – Avión Oct 24 '16 at 08:00
  • 1
    @Avión the problem is `if 0<=x<=14 then return "key1" else if 15<=x<=34 then return "key2" ...` – ymonad Oct 24 '16 at 08:04
  • Oh thanks @ymonad I get it now. – Avión Oct 24 '16 at 08:05
  • @Avión, because what I got is the 'block size'. So the 'address' should be accumulated. – minion Oct 24 '16 at 08:06
  • The logic in your existing code isn't quite right. Eg, `get_key(10)` returns 0. Also, you need to accumulate the block sizes into `offset`, but it's more efficient to search a list of the accumulated block sizes. – PM 2Ring Oct 24 '16 at 08:11
  • @ PM 2Ring get_key() return the index of item in the list. – minion Oct 24 '16 at 08:14
  • 1
    Ok. But your assignment to `offset` should be `offset += i[1]` – PM 2Ring Oct 24 '16 at 08:20

2 Answers2

5
  1. Build a list of absolute address of the end point of each item. The second item of this list is guaranteed to be ascending obviously.

L2 = [('key1', 14), ('key2', 34), ('key3', 47), ('key4', 57), ('key5', 68)]

  1. Perform binary search for the requested absolute address on upper list.

Overall time complexity should be O(n) with space complexity of O(n), each query takes O(logn) time.

TurtleIzzy
  • 997
  • 7
  • 14
  • Does Python have a concise syntax to perform the binary search on the list? Like `sort` , which can use a particular element as the key? – minion Oct 24 '16 at 08:21
  • 1
    `bisect.bisect` returns the index to an array, Split your L2 into a list of addresses `lst_addr` and a list of elements `lst_elem`. The result would be `lst_elem[bisect.bisect(lst_addr, query)]` – TurtleIzzy Oct 24 '16 at 08:43
4

You can use an array bisection algorithm to drop the requested amount into the right offset.

The documentation for the bisect module provides the following example which can be modified easily for this:

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
        i = bisect(breakpoints, score)
        return grades[i]

>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Here is how you would use it (untested):

markers = [i[0] for i in initial_list_of_tuples]
boundaries = [i[1] for i in initial_list_of_tuples]
offsets = []
offsets.append(boundaries[0])

for offset in boundaries[1:]:
   offsets.append(sum(offsets)+offset)

def buckets(value, boundaries, markers):
   i = bisect(boundaries, value)
   return markers[i]
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284