10

I need to support a more inserts than reads and keep the data sorted. Which would be better performing:

Using a PriorityQueue providing a comparator

or

Using an ArrayList and calling .sort() after each insert?

Calling .sort() each time feels wrong, but I can't articulate why.

lealceldeiro
  • 14,342
  • 6
  • 49
  • 80
AfterWorkGuinness
  • 1,780
  • 4
  • 28
  • 47

2 Answers2

11

A priority queue will not keep your data sorted. It just allows you to make calls to get its minimum element. If you do that for all the elements in the priority queue, you'll eventually be able to form a list of sorted elements. But then again, you'll have an empty priority queue.

So if you need to be able to read things on the fly on any position without mutating the data-structure, the priority queue is not for you.

What you're probably looking for is to use a TreeSet / TreeMap, that allows you to keep the data sorted and insertions / deletions relatively cheap (roughly O(lg n)).

devoured elysium
  • 101,373
  • 131
  • 340
  • 557
  • Overlooked that point. If the question were then between calling .sort() on an ArrayList and using a TreeSet (I'm fine with no dupes), it sounds like TreeSet is faster. – AfterWorkGuinness Oct 02 '18 at 19:50
  • 2
    I would assume the treeset to be considerably slower. A treeset is relatively 'heavy', with self-balancing logic, lots of nodes and pointer indirection, etc. On the other hand, a standard quick / merge sort should be relatively straightforward and without the need to create all those nodes in memory like the treeset does. The only advantage of the treeset is that it's dynamic. – devoured elysium Oct 02 '18 at 19:54
  • 2
    @devouredelysium a `TreeSet` is implemented over a `Map` in general, it's been like that for the past 20 years - not a lot of people complain about indirection, nodes, memory. Use whatever reads the easiest - and its *absolutely* `TreeSet` – Eugene Oct 02 '18 at 20:00
  • 1
    @Eugene The statement was "it sounds like TreeSet is faster (than a sort over an array list)", which is demonstrably false. What is it that you're really complaining about? – devoured elysium Oct 02 '18 at 20:01
  • @Eugene he wants to support more inserting than reading. Maybe reading faster is not what he's looking for. – Judismar Arpini Junior Oct 02 '18 at 20:03
  • 1
    @devouredelysium: Perhaps you missed the phrase "calling .sort() *after each insert*?". Clearly calling `.sort` once is faster. However, calling sort after every insert is worse than quadratic, which is eventually going to slug. – rici Oct 02 '18 at 20:22
  • 1
    @AfterWorkGuinness if you plan to do it *once* (unlike what you have stated in the question), sorting the List will be a winner; If you plan it to do more often like you say in your question - use a `TreeSet`. – Eugene Oct 02 '18 at 20:22
  • @rici to be honest whatever is the obvious thing, is not [always obvious](https://shipilev.net/blog/2016/arrays-wisdom-ancients/) just saying that without proper tests, I don't trust myself sometimes; no matter if it looks obvious or not. – Eugene Oct 02 '18 at 20:26
  • @Eugene: OK, perhaps calling .sort once is not faster. It's a minor point. The super-quadratic algorithm is eventually going to be slower, ancient wisdom or no. :) (Although you seem to agree that calling .sort once will be faster.) – rici Oct 02 '18 at 20:28
  • @rici no no, totally agree - should have made that clear. I was only saying that in *general*, my bad – Eugene Oct 02 '18 at 20:29
  • @Eugene and @rici there's no way to say that `.sort` will be faster than using a `TreeSet` without testing on real-world data, because `sort` performance depends on the distribution of the data before sorting, and the distribution depends on the specific use case, and it can even change from time to time, or depending on the time of day, etc. If the data is almost sorted, `sort` will be almost linear. If it's random, it will be `NlogN` worst case. – fps Oct 02 '18 at 20:38
  • @Federico: I've done lots of sorts of real-world data, and `sort` always works out faster, if you can wait until the dataset is complete before sorting. That's not to say that TreeMaps and PriorityQueues don't have their uses: they absolutely do. In general, their uses will be in cases where there are lots of inserts into the dataset and also lots of occasions where you need to be able to do a sorted lookup on the current state of the dataset. Which is precisely the context of the OP, so the question of whether or not a single sort is faster or slower is irrelevant, as is this comment thread. – rici Oct 02 '18 at 22:02
2

PriorityQueue is a min/max heap - data is not sorted inside it; you would always need to call poll until the queue is exhausted - since the first element inside it is always the "smallest/biggest" one according to your Comparator.

You are really looking for TreeSet; no need to call sort every time you insert/delete an element.

Eugene
  • 117,005
  • 15
  • 201
  • 306