3

My problem is to select a data structure for an event-based simulation.

A number N of future events are maintained together with their time of occurrence. N is fixed or at least bounded, ranging from 10 to maybe 10000. Each event has a unique ID by which it can be retrieved in principle, in addition to the time.

In a loop, the following happens. The next-occuring event is removed and executed, and a new event of the same kind for a random future time is generated to replace it. As a side effect, a few (<10) existing events change their time of occurrence and need to get rescheduled. The IDs of these to-be-rescheduled events are known but not their times of occurrence.

I was thinking a heap would be good to get the lowest element quickly but I also need quick reordering of arbitrary elements which are accessed by ID. There is BatHeap which does finding elements and insertion in O(log N), but it seems not to allow indexed access?

I would prefer a persistent structure (partly for educational purposes) but if only a mutable structure works fast, i'll use that.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
user3240588
  • 1,252
  • 9
  • 16

2 Answers2

2

You need a priority queue for your problem and for priority queue, heap is the best way to go.

If you want a persistent heap, you can implement a Leftist heap in OCaml, for your educational purpose.

However, BatHeap can also meet your need as it is functional heap.


Ok, I know what you wish a heap + map thing now.

You need to know that the operations of map in OCaml is O(logN) because of functional reason. you can use hashtbl, but it uses array which is imperative, not persistent way or functional way.

If you want a pure functional way, and can accept O(logN), then you need to have two datastructures, one is heap and the other is map.

In the map, the key is the event type id, and the value is the heap for that event type.

But I guess even if you invent a HeapMap thing, you need anyway double spaces as two kind of information (order of time and order of index) are maintained.

Jackson Tale
  • 25,428
  • 34
  • 149
  • 271
  • thanks, i now realize that my question was not very precise. when rescheduling, i need to access the future events "by identity" so i now think BatHeap can't even do it... – user3240588 Jan 27 '14 at 14:55
  • Can you update your problem to describe it more precisely? this will help people to help you. For example, what means `by identity`? You need a combination of `heap` and `map`? – Jackson Tale Jan 27 '14 at 15:00
  • i already updated it. the id is the "kind" of an event of which there are N; the 'heap' contains one event of every kind. it could be an integer or some hash, or physical identity of the events as determined by (==). i do think i need some combination of map and heap since i want two ways of indexing into the structure essentially. – user3240588 Jan 27 '14 at 15:05
  • @user3240588 ok, maybe you can maintain an array of `N` heaps? if you want a pure functional way, you can have a `map` of `heaps` where the id of each event kind is the key – Jackson Tale Jan 27 '14 at 15:08
  • thanks, but i think having many heaps will not work. i think there is a misunderstanding: in my description 'id' and 'event type' are the same thing. there is always only one event of every type. so it makes no sense to put that one event into a heap... – user3240588 Jan 27 '14 at 15:44
  • @user3240588 so you mean every event has two properties: `id` and `time`, right? basically, you want `time` to be prioritised and `id` to be indexed, is this correct? – Jackson Tale Jan 27 '14 at 16:19
  • yes, that's correct. @gasche's suggestion of making an ordered container on time and an extra indexed mapping from id to time seems to work for that. – user3240588 Jan 27 '14 at 16:36
  • @user3240588 this was original what I proposed, but then misunderstood by your 2nd comment, but anyway, hope you find your real answer – Jackson Tale Jan 27 '14 at 16:45
2

On first approximation, you can use a Heap or Map (Heap has O(1) remove-first instead of O(log n), but may not implement random removal) paired to a Hashtable that maps ids to times. The mapping structure needs not be persistent as it can be trivially rebuilt from the time-indexed structure.

gasche
  • 31,259
  • 3
  • 78
  • 100
  • ok this sounds feasible. iiuc, there is a Map with float keys which are the times. In log(N) i take out the lowest element, log(N) again insert the replacement event. then i take the ids of the candidates for rescheduling, and look them up in a Hashtbl in O(1), getting back the event times as values. Again log(n) for each of them to get them out of the Map by the times and log(n) to reinsert each of them into the Map with the new modified times. Finally, remove the old entry from the Hashtbl and add the new one (no idea how costly that is) – user3240588 Jan 27 '14 at 15:54
  • actually, if i define integer indices for each event type (id) i could replace the Hashtbl part with a plain float array which holds the times. could be faster? – user3240588 Jan 27 '14 at 16:03
  • yeah, if they're consecutive integers, it would be sensibly faster than a hashtable. But I'm not sure how you can produce a flow of relatively-continuous integers with reasonable bounds, and dynamic insertion/deletion, without resorting to semi-weird compaction tactics. I guess it depends on the application. – gasche Jan 27 '14 at 16:14
  • yes in my particular application it would work. there are exactly N event types, and exactly one of each is scheduled at any given time. so i can just assign indexes from 1 to N to label the event types. – user3240588 Jan 27 '14 at 16:38
  • after some more wiki-googling i am accepting this answer. (sorry @jackson-tale but this is the better explanation). side note for others: a single [treap](https://en.wikipedia.org/wiki/Treap) also seems a plausible option. since the events have random occurrence times, no extra random number for the treap is needed; and frequently occurring event types rise to the top. – user3240588 Jan 27 '14 at 21:44