7

I have a list of items that should be inserted in a list-like data structure one after the other, and I have the indexes at which each item should be inserted. For example:

items = ['itemX', 'itemY', 'itemZ']
indexes = [0, 0, 1]

The expected result is to have a list like this: result = ['itemY', 'itemZ', 'itemX'].

I'm able to get this result with this simple approach:

result = []
for index, item in zip(indexes, items):
    result.insert(index, item)

However, this is a very slow approach once lists become huge (the complexity is O(n^2)). Is there any (relatively simple to implement) way to improve my basic approach? I guess I have to look at other data structures while I insert elements and finally transform that data structure into my result list. Are trees a good option? Insert could be done maybe in O(log(n)) (instead of O(n)), but which specific tree-like structure should I use?

Or maybe something good can be achieved by just looking at all the indexes together (instead of using them one by one).

This is probably the worst case for my slow approach (always insert items at the beginning of the list):

n = 10**6 # some large number
items = list(range(n))
indexes = [0] * n
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50
  • Your "probably worst case" would be very easy for a human, because in that particular case, all we need to do is to insert the items in reverse order. This is because the last item inserted at position 0 will really end up at position 0. This could be the basis for an algorithm: try to "guess" the true final index of an item. An item `x` inserted at index `k` will have true final index `k + m`, where `m` is the number of items inserted later than `x` and with an index `<= k`. – Stef Aug 26 '21 at 14:22
  • 1
    Actually that's not true and it's more complicated than that. Sorry. – Stef Aug 26 '21 at 14:24
  • haven't completely thought this out, but how about working in reverse? make a new list of the same size filled with `None` now you can insert the last item to the correct index, then go backwards in the list and all you need to know is how many insertions happened before your wanted index and if the cell is already taken you look ahead untill you find the next open cell (with `None` as a value) – AntiMatterDynamite Aug 26 '21 at 15:19
  • @AntiMatterDynamite That works, but has the same asymptotic complexity `O(n^2)`. For each index you might have to skip n occupied indexes on average. – user2390182 Aug 26 '21 at 15:23
  • same complexity but not the same overhead, since you dont need to copy the entire array for every insert – AntiMatterDynamite Aug 26 '21 at 15:35

2 Answers2

1

Here's python code for a treap with a size decoration that allows insertion at specific indexes, and reordering of whole contiguous sections. It was adapted from C++ code, Kimiyuki Onaka's solution to the Hackerrank problem, "Give Me the Order." (I cannot guarantee that this adaptation is bug free -- a copy of the original code is available in the description of this question.)

import random

class Treap:
  def __init__(self, value=None):
    self.value = value
    self.key = random.random()
    self.size = 1
    self.left = None
    self.right = None


def size(t):
  return t.size if t else 0


def update(t):
  if t:
    t.size = 1 + size(t.left) + size(t.right)
  return t


def merge(a, b):
  if not a:
    return b
  if not b:
    return a

  if a.key > b.key:
    a.right = merge(a.right, b)
    return update(a)
  else:
    b.left = merge(a, b.left)
    return update(b)


def split(t, i):
  if not t:
    return None, None

  if i <= size(t.left):
    u, t.left = split(t.left, i)
    return u, update(t)
  else:
    t.right, u = split(t.right, i - size(t.left) - 1)
    return update(t), u


def insert(t, i, value):
  left, right = split(t, i)
  u = Treap(value)
  return merge(merge(left, u), right)


def inorder(treap):
  if not treap:
    return

  if treap.left:
    inorder(treap.left)

  print(treap.value)

  if treap.right:
    inorder(treap.right)

Output:

lst = ['itemX', 'itemY', 'itemZ']
idxs = [0, 0, 1]

t = None

for i in range(len(lst)):
  t = insert(t, idxs[i], lst[i])

inorder(t)

"""
itemY
itemZ
itemX
"""
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • 1
    Thanks, this seems to work well (not sure it's "optimal" though). I will study your solution and if no other people suggest something else I will mark your answer as accepted :) Some data in the meanwhile: if I run the worst case using my approach and setting n=10**6 I need to wait about ~200 seconds; instead, your approach only needs 18 seconds. Your approach is of course a bit slower when the input is small – Riccardo Bucco Aug 26 '21 at 18:08
  • 2
    This idea is one way of building a data structure called an ***order statistics tree***, an augmented binary search tree that supports insertion at specific indices. You may want to search for that term online to see if anything else you find helps out. – templatetypedef Aug 26 '21 at 20:14
1

You could use SortedList, neutralize its sorting with a constant key function, and only use it for its fast insertions. Needs version 1.5.10 or older, as insert got removed.

def insertions(indexes, items):
    tmp = SortedList(key=lambda _: 0)
    for index, item in zip(indexes, items):
        tmp.insert(index, item)
    return list(tmp)

(I imagine there's also something like it but without sorting that needs to be neutralized, sortedcontainers is just something I know.)

Benchmark results:

               indexes =   [0] * 10**6     [randint(0, i) for i in range(10**6)]
--------------------------------------------------------------------------------
original                  1540 seconds     759 seconds
neutralized SortedList      13 seconds      31 seconds
sorted mediants            201 seconds     249 seconds
sorted mediants optimized   42 seconds      72 seconds

Those last two solutions are another idea:

Use a SortedList the normal way, but annotate each item with a fraction from 0 to 1 (and order by that). To insert between two items, use those items' mediant.

from sortedcontainers import SortedList
from fractions import Fraction

def insertions(indexes, items):
    xs = SortedList([(Fraction(0), None), (Fraction(1), None)])
    for index, item in zip(indexes, items):
        a, c = xs[index][0].as_integer_ratio()
        b, d = xs[index + 1][0].as_integer_ratio()
        xs.add((Fraction(a+b, c+d), item))
    return [item for _, item in xs[1:-1]]

Optimized version doing fractions myself:

from sortedcontainers import SortedList

class X(tuple):
    def __lt__(self, other):
        return self[0] * other[1] < self[1] * other[0]

def insertions(indexes, items):
    xs = SortedList([X((0, 1, None)), X((1, 1, None))])
    for index, item in zip(indexes, items):
        L, R = xs[index : index+2]
        xs.add(X((L[0] + R[0], L[1] + R[1], item)))
    return [x[2] for x in xs[1:-1]]
no comment
  • 6,381
  • 4
  • 12
  • 30