I am looking for a suitably efficient storage structure for processing data that arrives in random order, but must be processed and/or removed from the stack in a specified order.
To make this clearer:
Each item x has an index i, a time-stamp t and is processed as follows (assuming the storage structure is already populated).
repeat
1) Process (remove) the item with the smallest time stamp.
2) Add new items (0 or more).
3) Remove (0 or more) items (referenced by their index).
until false
It is guaranteed that each item will have unique timestamp and index. But it is impossible to predict how many items will be added in step 2) until step 1) is completed, or how many items will be removed in step 3) until steps 1) and 2) are completed within each loop of the algorithm. The distribution of timestamps for the incoming items cannot be predicted (except that new timestamps will be in the 'future'), and may vary with time - i.e. there could be a bunch of randomly distributed timestamps followed by a bunch with timestamps all greater (or less than) any of the items remaining in the list. There will be a maximum number of elements waiting to be processed at any one time, but this could be fairly large ~10^6-10^8, and I need to process them as fast as possible - note that the actual processing time is fairly small so for large sets of data I expect that the 'scheduling' will dominate the rate that I can process data.
If I add each item to a linked sorted list as it arrives in step 2), then step 1) is O(0), but step 2) is O(n). If I use a binary tree then step 1) is still O(1) and now step 2) is initially O(log n), but if the timestamps are not nicely distributed the tree could become unbalanced very quickly which would slow step 2) down significantly (eventually no better than O(n) if I don't regularly re-balance the tree).
My guess is that something along the lines of a binary tree with regular re-balancing should provide O(log n) if the re-balancing is done at the right intervals, but I assume that this sort of problem is well-solved, so could someone direct me to a suitable reference or give me a little advice to help me avoid re-inventing the wheel.