I have a large array with increasing values - like this:
array = [0, 1, 6, 6, 12, 13, 22, ..., 92939, 92940]
and I want to use interpolation-search algorithm on it. Size of the array is variable, new elements are added to the end of the array.
I need to find index of some element, let's call it X.
Y = find(X in array)
Y must be an index of the element from the array such that array[Y] >= X
find
can be implemented with binary search but for some complicated reasons I want to implemented it using interpolation search. Interpolation search tries to guess correct position of the X by looking at the bounds of the array. If first array value is 0 and last is 100 and I want to find position of the value 25 than, if array length is 1000, I need to look at value at index 250 first. This works as a charm if values of the array is evenly distributed. But if they not evenly distributed, interpolation search can work slower than binary search (there is some optimizations possible).
I'm trying to speed up search in such cases using Count-Min Sketch data structure. When I appending new element to the array I just add some data to count-min sketch data-structure.
Z = 1005000
elements_before_Z = len(array)
array.append(Z)
count_min_sketch.add(Z, elements_before_Z)
# Z is the key and elenents_before_Z - is count
using this approach I can guess position of the searched element X approximately. This can result in search speedup if guess is correct, but I've ran into some problems.
I don't know if X is in array and my
count_min_sketch
have seen this value. If this is the case I can get correct value out ofcount_min_sketch
datastructure. If it's not - I will get 0 or some other value (worst case scenario).Collisions. If value X have been seen by my
count_min_sketch
object - than I get back correct value or larger value. If count min sketch used for something like counting word occurrence in document - this is not a problem because collisions is rare and error is less or equal than number of collisions (it usually used like this: count_min_sketch.add(Z, 1)). In my case, every collision can result in large error, because I usually add large numbers for every key.
Is it possible to use count-min sketch in such way (adding large number of entries every time)?