1

I need an algorithm with some data structure in Python that at every step when two new elements e1, e2 are given:

  • finds the insertion positions (conserving the order) of the first and the second given elements.
  • finds the maximum value of the elements in the interval between the two insertion positions.
  • inserts at the previously found insertion position of the second given element, the second given element paired with the maximum value found in the interval plus a constant. Unless the second given element was already there, in which case we only need to update its value if the new value is greater.

And this step must be done in no more than logarithmic time, because when this step is repeated N times the overall worst-case time complexity cannot be far from O(NlogN).

-- For example: my_list = [(2,1), (4,3), (5,7), (9,1)]

As we see, the element 2 is paired with its assigned value 1, the element 4 is paired with value 3, 5 is paired with value 7, and 9 with value 1. And my_list is ordered by the first elements of the pairs.

Now, two elements are given, e1 = 3, e2 = 6.

The insertion position in my_list of (e1, ) == (3, ) is index 1, and the insertion position of (6, ) is index 3.

The maximum value found in the elements of my_list between indices 1 and 3 is value 7, because the maximum value of (4,3), (5,7) is 7.

Imagining that the constant to add is 1 we have: maximum value found + constant == 7 + 1 == 8. And we had e2 == 6, so the pair to insert is (6, 8) at the index 3.

And at the end of this step my_list must be: [(2,1), (4,3), (5,7), (6,8), (9,1)]

-- This question linked here is very similar but differs with my question on the index where the element is inserted. In that question the element is added at the end (appended), in my case, the insertion must be done in a way that preserves the order of the elements such that the start and end of the next arbitrary interval can be found in logarithmic time. This is why I think that, besides using a range minimum query, I will also need to use some advanced data structure like an interval tree or a treap.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
David
  • 33
  • 4
  • If the interval where the maximum value must be found is empty, the step can be skipped. – David Jan 03 '18 at 22:47
  • Are you looking for implementation or what is exactly your are asking, I am not sure that I understand you. :) – Maytham Fahmi Jan 03 '18 at 22:48
  • @maytham-ɯɐɥʇʎɐɯ I am hoping somebody will give me a hint. What data structure to use, whether it can certainly be solved with range minimum queries, etc. – David Jan 03 '18 at 23:03
  • do you need an online algorithm or is batch processing acceptable? – Petar Petrovic Jan 04 '18 at 07:43
  • @PetarPetrovic The arbitrary given two elements in each step are not known from the start of the whole process if that’s what you mean. Although maybe they can be precomputed. – David Jan 04 '18 at 16:09

1 Answers1

1

For this type of job, I usually use an augmented B+ tree. See here for what a B+ tree is: https://en.wikipedia.org/wiki/B%2B_tree

Starting with a B+ tree, I would augment all of the internal nodes so that along with each pointer to a child node, it stores the maximum value associated with all keys in the subtree rooted at that child node.

That extra information makes it easy to calculate the maximum value for any interval in O(log N), and it's easy to maintain as you insert, delete, and modify items in the tree without changing the O(log N) complexity of those operations.

For an in-memory B+ tree, a maximum of 8 or so keys per node is usually performant.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Thank you for the answer! I am reading about B trees and B+ trees and then I’ll try to implement this in Python. – David Jan 04 '18 at 16:09
  • Is there a basic piece of code or skeleton in Python that I can modify to suit my need? The only one I’ve found is “a pure-python B tree and B+ tree implementation” on GitHub, but has more than 500 lines of code, and of very advanced syntax. – David Jan 06 '18 at 03:39
  • I hardly ever use Python, but if you mean this one: https://gist.github.com/teepark/572734 then it looks about right. Get rid of the BTree and keep the B+Tree. Get rid of remove/shrink/merge stuff, because you only need to add. There will only a a couple hundred lines left – Matt Timmermans Jan 06 '18 at 03:49