3

Is it somehow possible to iterate over a collection sorting the content on the fly without creating a copy?

UPDATE: Collection to sort is a read-only list.

30thh
  • 10,861
  • 6
  • 32
  • 42
  • You can sort in place (e.g. quicksort) if the collection permits indexed access. This does not create a copy. - It won't be quite "on the fly" – laune Feb 11 '15 at 08:32
  • 1
    @Salah That other Q was about "parallel sort" so it's rather different. – laune Feb 11 '15 at 08:33
  • What's so bad about "creating a copy"? How big is your collection really? – laune Feb 11 '15 at 09:21
  • Copy is an option. But I thought, there is an alternative way. The size is about 10000. And there are thousands of them. It is batch process on a lightweight server. No time limit, but the memory is limited. – 30thh Feb 11 '15 at 19:01
  • A collection of 10000 requires 10000 references with an ArrayList, so the backing array is 4/8 times 10000. - Not really "thousands of them" at the same time?? – laune Feb 11 '15 at 19:12

2 Answers2

5

Using the Streams API you could do

yourCollection.stream()
              .sorted(yourComparator)
              .forEach(...);

This does not modify yourCollection and allows you to iterate over the collection in a sorted order. However, chances are that the sorted method creates a copy behind the scenes, so you'll most likely get the same memory/cpu overhead as if you create a copy, sort the copy and iterate over the sorted copy yourself.

(For Java 7 and earlier, I don't think there's a "non-intrusive" sorting method in the API. You'll have to explicitly make a copy if you don't want to modify the original collection.)

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • 1
    This does not create a copy? – laune Feb 11 '15 at 08:35
  • 4
    No, it totally does, internally. – Louis Wasserman Feb 11 '15 at 08:36
  • That depends on what you mean by copy. Will it have to look at the last element before outputing the first? Yes. Will it require 100% memory overhead, possibly not. – aioobe Feb 11 '15 at 08:37
  • If you look at the current implementation in the JDK, yes, it does make a copy internally. – Louis Wasserman Feb 11 '15 at 08:37
  • 1
    There's big chance that all popular JDK implementations and all popular collection-implementations gives you streams that implements .sorted by copying the stream. But there's certainly nothing that prevents a collection from implementing .stream with a low-memory-overhead .sorted method. – aioobe Feb 11 '15 at 08:44
  • 2
    It makes a copy because that's usually smarter. Making a copy is O(n), sorting is O(n\*log(n)), iterating is O(n). Trying to iterate a non-sorted collection in order would be, as far as I know, O(n\*n). Memory usage is, in both cases, O(n). So, it's pointless to not make a copy, unless there are memory constraints, but that would mean the array is huge, which also makes O(n\*n) infeasible. – Giulio Franco Feb 11 '15 at 08:58
  • @GiulioFranco "It" being streams sorted()? – laune Feb 11 '15 at 09:00
  • @laune yes, I was answering to your first question "This does not create a copy?" – Giulio Franco Feb 11 '15 at 09:02
  • @GiulioFranco Found the javadoc for Collections.sort: one for all, copies, does a merge sort. – laune Feb 11 '15 at 09:11
  • @laune yes. That's already said above. I was just trying to explain why Java developers made this choice, and why you shouldn't try to do it differently. – Giulio Franco Feb 11 '15 at 09:14
  • 1
    One always forgets that "copying an Object array" doesn't mean "copying all Objects in an array". – laune Feb 11 '15 at 09:20
  • @GiulioFranco O(nn) is only for the straight forward implementation. If you are using a buffer it can be O(nn/x), where "x" is a buffer size. – 30thh Feb 12 '15 at 12:26
  • 1
    @30thh, not sure I follow. If x is a fixed constant, then O(nn/x) = O(nn). If x is proportional to n, you still have a linear memory overhead. – aioobe Feb 12 '15 at 13:28
  • @aioobe: Theoretically you are of course right. X is fixed and N not. But for a real piece of code, where N is limited by design, such an optimisation makes perfectly sense. – 30thh Feb 13 '15 at 09:50
  • Absolutely. The memory/cpu trade-off can be fine tuned with the approach you suggest. Regarding your comment *"Collections.sort: one for all, copies, does a merge sort."* This should not be taken as an argument that other sort methods (such as List.sort) takes the same approach. See third paragraph of [this answer](http://stackoverflow.com/a/25962666/276052). – aioobe Feb 13 '15 at 10:38
1

Let's say you're trying to iterate through the sorted version of a collection.

What's the first element? The minimum element.

But you have to read through the entire collection to determine the minimum element, or there's no guarantee you haven't missed some element that comes later that's smaller than everything before it.

That more or less means you have to make a copy: you have to have read the whole collection before you can start returning elements, after which starting to read it again is kind of redundant.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • If the original list is random access, this is not evidence that it's not possible. (And it is possible, just not efficiently; it should be easy to imagine a streaming version of selection sort) – user253751 Feb 11 '15 at 09:05
  • 1
    Of course one have to iterate over the collection multiple time. Sometimes it is better, than a copy. For example if you have more CPU than memory. – 30thh Feb 11 '15 at 19:12
  • But even iterating over it multiple times doesn't get you much -- the naive way of doing it is O(n^2), which isn't really reasonable under any circumstances. I'm not aware of any library in any language that really does what you're trying to do. – Louis Wasserman Feb 11 '15 at 19:15