4

Question: does anybody know of a Java implementation (I have too little time/knowledge to develop my own right now) of a collection with the following characteristics?

  • fast add
  • fast random-access remove
  • fast minimum value
  • duplicates

Condensed (oversimplified) version of use case is:

  • I have a class that keeps track of 'time', call it TimeClass
  • Events start at monotonically increasing times (times are not unique), but can finish in any order
  • When events start they report their start time to TimeClass
  • When events finish they again report their start time to TimeClass
  • TimeClass adds an event's start time to a collection* when the event starts (fast add)
  • TimeClass removes an event's start time from that collection* when the event finishes (fast random-access remove)
  • TimeClass is capable of reporting the lowest not-yet-finished start time (fast minimum value)

* think of collection as: Collection<Time> implements Comparable<Time>

Because I'm not sure what the runtime behavior of my system (the system in which TimeClass lives) will be, I've quickly benchmarked the following scenarios using these collections: TreeMultiSet (Guava), MinMaxPriorityQueue (Guava), ArrayList.

Note, depending on the collection used, min value is achieved in different ways (remember elements are added in increasing order): TreeMultiSet.firstEntry().getElement(), MinMaxPriorityQueue.peekFirst(), ArrayList.get(0).

ADD 1,000,000:

  • TreeMultiSet: 00:00.897 (m:s.ms)
  • List: 00:00.068 (m:s.ms)
  • MinMaxPriorityQueue: 00:00.658 (m:s.ms)

ADD 1, REMOVE 1, REPEAT 1,000,000 TIMES:

  • TreeMultiSet: 00:00.673 (m:s.ms)
  • List: 00:00.416 (m:s.ms)
  • MinMaxPriorityQueue: 00:00.469 (m:s.ms)

ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN SEQUENTIAL ORDER:

  • TreeMultiSet: 00:00.068 (m:s.ms)
  • List: 00:00.031 (m:s.ms)
  • MinMaxPriorityQueue: 00:00.048 (m:s.ms)

ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:

  • TreeMultiSet: 00:00.046 (m:s.ms)
  • List: 00:00.352 (m:s.ms)
  • MinMaxPriorityQueue: 00:00.888 (m:s.ms)

Current thoughts:

I'm leaning towards using TreeMultiSet as it has the most stable performance and seems to degrade most gracefully. I WOULD LOVE MORE SUGGESTIONS

Thanks

--EDIT--

Example pseudo code of ADD ALL IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:

benchmark(){
    int benchmarkSize = 1000000;
    int benchmarkRepetitions = 100;
    Duration totalDuration = Duration.fromMilli(0);
    TimeClass timeClass = new TimeClassImplementation();
    for (int i = 0; i < benchmarkRepetitions; i++)
        totalDuration += benchmarkRun(timeClass,benchmarkSize);
    System.out.println(totalDuration);
}

Duration benchmarkRun(TimeClass timeClass, int count){
    List<Time> times = createMonotonicallyIncreasingTimes(count)

    // monotonically increasing times to add from
    List<Time> timesToAddFrom = copy(times)

    // random times to remove from
    List<Time> timesToRemoveFrom = shuffleUniformly(copy(times))

    Time startTime = now()

    // add all times
    for(Time time: timesToAddFrom) {
        Time min = timeClass.addTimeAndGetMinimumValue(time);
        // don't use min value
    }

    // remove all times
    for(Time time: timesToRemoveFrom) {
        Time min = timeClass.removeTimeAndGetMinimumValue(time);
        // don't use min value
    }

    Time finishTime = now()

    return finishTime - startTime;
}
Alex Averbuch
  • 3,245
  • 5
  • 33
  • 44
  • 5
    "I've quickly benchmarked the following scenarios". How did you do that? Show us your benchmark code. –  Aug 01 '14 at 10:00
  • minimum value depends on value itself or on adding order? – udalmik Aug 01 '14 at 10:08
  • @Tichodroma, two points: (1) I've added some pseudo code now (assume that the timeClass instances passed in have different underlying implementations, using different collections) (2) there was an error in my code where I ran LinkedList twice and MinMaxPriorityQueue never, I am rerunning to get MinMaxPriorityQueue numbers now – Alex Averbuch Aug 01 '14 at 10:17
  • @udalmik the value itself Time implements Comparable – Alex Averbuch Aug 01 '14 at 10:17
  • I've edited the results now, to show corrected values for MinMaxPriorityQueue performance. I used 10,000 instead of 1,000,000 in some tests for two reasons: (1) tests take too long otherwise (2) MinMaxPriorityQueue seems to have a bug where it loses values if the collection size is too large (i.e. remove(Time) returns false) – Alex Averbuch Aug 01 '14 at 10:54
  • 2
    The bad news is that benchmarking in Java is much harder than expected. The results of any measurement taking a fraction of second are essentially just (low quality) random numbers, unusable for anything, only slightly correlated to real use performance (after the code gets warmed up, etc.). See [this question](http://stackoverflow.com/q/504103/581205). – maaartinus Aug 03 '14 at 18:40
  • thanks @maaartinus very useful resource! FYI, (1) I've run longer tests (up to 15s) too and (2) I'm only really comparing those numbers as an orders-of-magnitude comparison in the scenarios I consider most important -> "ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER" – Alex Averbuch Aug 03 '14 at 20:44

1 Answers1

1

Your best bet is a tree map:

http://docs.oracle.com/javase/7/docs/api/java/util/TreeMap.html

O(log n) pretty much for all operations.. you can get your keys back sorted.

There is also a MinMaxPriorityQueue from Google (guava)

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/MinMaxPriorityQueue.html

the remove is O(n) though, all other operations are O(log n)

g00dnatur3
  • 1,173
  • 9
  • 16
  • As explained in my question, I have tried MinMaxPriorityQueue already. Also, without a wrapper around it, how would TreeMap fit my needs? I have no keys because Times can have duplicate values. I **could** keep a counter in the value (e.g., Map – Alex Averbuch Aug 01 '14 at 11:00
  • Sure, and I appreciate the input, but making keys unique changes the problem – Alex Averbuch Aug 01 '14 at 14:23
  • @user1500191: What you're talking about is totally beside the point. It's possible for a data structure to have non-unique entries without sacrificing performance. In the case of something like a `Multiset`, it's pretty straightforward to just store unique elements along with the _count_ of each element internally, while presenting that as multiple copies of the same element through the API. – ColinD Aug 05 '14 at 16:55