Java collection supporting: duplicate values, fast add, fast remove, fast min value?

Question

Question: does anybody know of a Java implementation (I have too little time/knowledge to develop my own right now) of a collection with the following characteristics?

fast add
fast random-access remove
fast minimum value
duplicates

Condensed (oversimplified) version of use case is:

I have a class that keeps track of 'time', call it TimeClass
Events start at monotonically increasing times (times are not unique), but can finish in any order
When events start they report their start time to TimeClass
When events finish they again report their start time to TimeClass
TimeClass adds an event's start time to a collection* when the event starts (fast add)
TimeClass removes an event's start time from that collection* when the event finishes (fast random-access remove)
TimeClass is capable of reporting the lowest not-yet-finished start time (fast minimum value)

* think of collection as: Collection<Time> implements Comparable<Time>

Because I'm not sure what the runtime behavior of my system (the system in which TimeClass lives) will be, I've quickly benchmarked the following scenarios using these collections: TreeMultiSet (Guava), MinMaxPriorityQueue (Guava), ArrayList.

Note, depending on the collection used, min value is achieved in different ways (remember elements are added in increasing order): TreeMultiSet.firstEntry().getElement(), MinMaxPriorityQueue.peekFirst(), ArrayList.get(0).

ADD 1,000,000:

TreeMultiSet: 00:00.897 (m:s.ms)
List: 00:00.068 (m:s.ms)
MinMaxPriorityQueue: 00:00.658 (m:s.ms)

ADD 1, REMOVE 1, REPEAT 1,000,000 TIMES:

TreeMultiSet: 00:00.673 (m:s.ms)
List: 00:00.416 (m:s.ms)
MinMaxPriorityQueue: 00:00.469 (m:s.ms)

ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN SEQUENTIAL ORDER:

TreeMultiSet: 00:00.068 (m:s.ms)
List: 00:00.031 (m:s.ms)
MinMaxPriorityQueue: 00:00.048 (m:s.ms)

ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:

TreeMultiSet: 00:00.046 (m:s.ms)
List: 00:00.352 (m:s.ms)
MinMaxPriorityQueue: 00:00.888 (m:s.ms)

Current thoughts:

I'm leaning towards using TreeMultiSet as it has the most stable performance and seems to degrade most gracefully. I WOULD LOVE MORE SUGGESTIONS

Thanks

--EDIT--

Example pseudo code of ADD ALL IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER:

benchmark(){
    int benchmarkSize = 1000000;
    int benchmarkRepetitions = 100;
    Duration totalDuration = Duration.fromMilli(0);
    TimeClass timeClass = new TimeClassImplementation();
    for (int i = 0; i < benchmarkRepetitions; i++)
        totalDuration += benchmarkRun(timeClass,benchmarkSize);
    System.out.println(totalDuration);
}

Duration benchmarkRun(TimeClass timeClass, int count){
    List<Time> times = createMonotonicallyIncreasingTimes(count)

    // monotonically increasing times to add from
    List<Time> timesToAddFrom = copy(times)

    // random times to remove from
    List<Time> timesToRemoveFrom = shuffleUniformly(copy(times))

    Time startTime = now()

    // add all times
    for(Time time: timesToAddFrom) {
        Time min = timeClass.addTimeAndGetMinimumValue(time);
        // don't use min value
    }

    // remove all times
    for(Time time: timesToRemoveFrom) {
        Time min = timeClass.removeTimeAndGetMinimumValue(time);
        // don't use min value
    }

    Time finishTime = now()

    return finishTime - startTime;
}

"I've quickly benchmarked the following scenarios". How did you do that? Show us your benchmark code. — , Aug 01 '14 at 10:00
@Tichodroma, two points: (1) I've added some pseudo code now (assume that the timeClass instances passed in have different underlying implementations, using different collections) (2) there was an error in my code where I ran LinkedList twice and MinMaxPriorityQueue never, I am rerunning to get MinMaxPriorityQueue numbers now — Alex Averbuch, Aug 01 '14 at 10:17
I've edited the results now, to show corrected values for MinMaxPriorityQueue performance. I used 10,000 instead of 1,000,000 in some tests for two reasons: (1) tests take too long otherwise (2) MinMaxPriorityQueue seems to have a bug where it loses values if the collection size is too large (i.e. remove(Time) returns false) — Alex Averbuch, Aug 01 '14 at 10:54
The bad news is that benchmarking in Java is much harder than expected. The results of any measurement taking a fraction of second are essentially just (low quality) random numbers, unusable for anything, only slightly correlated to real use performance (after the code gets warmed up, etc.). See [this question](http://stackoverflow.com/q/504103/581205). — maaartinus, Aug 03 '14 at 18:40
thanks @maaartinus very useful resource! FYI, (1) I've run longer tests (up to 15s) too and (2) I'm only really comparing those numbers as an orders-of-magnitude comparison in the scenarios I consider most important -> "ADD 10,000 IN SEQUENTIAL ORDER, REMOVE ALL IN RANDOM ORDER" — Alex Averbuch, Aug 03 '14 at 20:44

score 1 · Answer 1 · answered Aug 01 '14 at 10:19

1

Your best bet is a tree map:

http://docs.oracle.com/javase/7/docs/api/java/util/TreeMap.html

O(log n) pretty much for all operations.. you can get your keys back sorted.

There is also a MinMaxPriorityQueue from Google (guava)

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/MinMaxPriorityQueue.html

the remove is O(n) though, all other operations are O(log n)

answered Aug 01 '14 at 10:19

g00dnatur3

1,173
9
16

As explained in my question, I have tried MinMaxPriorityQueue already. Also, without a wrapper around it, how would TreeMap fit my needs? I have no keys because Times can have duplicate values. I **could** keep a counter in the value (e.g., Map) but wouldn't that be duplicating what the Guava TreeMultiSet already offers? (I realize that perhaps the standard Java TreeMap would have a faster implementation, but I am skeptical...) – Alex Averbuch Aug 01 '14 at 11:00
Sure, and I appreciate the input, but making keys unique changes the problem – Alex Averbuch Aug 01 '14 at 14:23
@user1500191: What you're talking about is totally beside the point. It's possible for a data structure to have non-unique entries without sacrificing performance. In the case of something like a `Multiset`, it's pretty straightforward to just store unique elements along with the _count_ of each element internally, while presenting that as multiple copies of the same element through the API. – ColinD Aug 05 '14 at 16:55

Java collection supporting: duplicate values, fast add, fast remove, fast min value?

1 Answers1