2

Is it better to pass the size of Collection to Collection constructor if I know the size at that point? Is the saving effect in regards to expanding Collection and allocating/re-allocating noticable?

What if I know minimal size of the Collection but not the upper bound. It's still worth creating it at least with minimal size?

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
ps-aux
  • 11,627
  • 25
  • 81
  • 128

5 Answers5

6

Different collections have different performance consequences for this, for ArrayList the saving can be very noticeable.

import java.util.*;
public class Main{
public static void main(String[] args){
  List<Integer> numbers = new ArrayList<Integer>(5);
  int max = 1000000;
  // Warmup
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }

  long start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(max);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Preall: "+(System.currentTimeMillis()-start));

  start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(5);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Resizing: "+(System.currentTimeMillis()-start));

}
}

Result:

Preall: 26
Resizing: 58

Running with max set to 10 times the value at 10000000 gives:

Preall: 510
Resizing: 935

So you can see even at different sizes the ratio stays around the same.

This is pretty much a worst-case test but filling an array one element at a time is very common and you can see that there was a roughly 2*speed difference.

Tim B
  • 40,716
  • 16
  • 83
  • 128
  • I have actually calculated it once and you need to hit a very narrow edge case to make it noticeable. Assume the minimal object size, 24 bytes; a reference in the array costs 4 bytes (assuming compressed OOPs). The worst possible overhead is 4 bytes per object, that is, 1/7 of the useful allocation. Is that "very noticeable"? One thing is sure: is as noticeable as you'll ever get. – Marko Topolnik Jan 15 '14 at 13:41
  • Just out of curiosity, at roughly what size does it become noticable? – Eric Jan 15 '14 at 13:41
  • 3
    @MarkoTopolnik Why on earth would you calculate that? ;-) More seriously, that only addresses the memory aspect but not the performance cost of resizing (which I have not measured). Time for a quick jmh? – assylias Jan 15 '14 at 13:45
  • @assylias I was just going to say something about that :) The cost is quite small so at least it would be a case of premature optimization to start out by assuming it matters. – Marko Topolnik Jan 15 '14 at 13:47
  • 1
    @timb Please, not `currentTimeMillis` :) – Marko Topolnik Jan 15 '14 at 13:48
  • @MarkoTopolnik premature optimisation is an interesting concept when the said optimisation leads to less readable, more complicated etc. code. In this case `new ArrayList<>();` and `new ArrayList<>(size)` are not that different that I would place the idea in the premature optimisation bucket. – assylias Jan 15 '14 at 13:48
  • @MarkoTopolnik Its only premature optimization is its even slightly difficult. Sometimes you have no idea how big it will be. But usually you do and its really easy to have a rough guess – Richard Tingle Jan 15 '14 at 13:49
  • @MarkoTopolnik I could use nanoTime but then I have to divide the result by lots to be meaningful. This test is large enough and the difference large enough that millisecond resolution is plenty. – Tim B Jan 15 '14 at 13:49
  • @assylias There are lots of small things like that around Java, each individual thing not being a big deal. However, having your code thickly drowned with such minor optimization concerns is still the wrong way to go, IMO. For example, while dealing with Socket I/O, which internally does all kinds of "redundant" copying from buffer to buffer, that initial size on the arraylist is little more than a soothing thumb in your mouth :) – Marko Topolnik Jan 15 '14 at 13:54
  • Just ran the tests in a more stable and faster environment (was getting widely varying results on the first one) and am consistently getting twice the speed on the pre-allocated case. – Tim B Jan 15 '14 at 13:54
  • 1
    @MarkoTopolnik I don't know, it's just a habit, when I write `new ArrayList<>()` the first thing I ask myself is: do I know the final size, and if I do I just type it. But I take your point. – assylias Jan 15 '14 at 14:01
  • 1
    @assylias Quick jmh done :) Results for 10,000-sized arrayList: default size 80 µs, presized 50 µs. Results for 1,000-sized arrayList: default size 6.2 µs, presized 4.9 µs. Conclusion: the ratio itself is *barely* significant, however the overall time spent on that task is insignificant. – Marko Topolnik Jan 15 '14 at 14:05
  • @MarkoTopolnik You should write an answer for a counter-opinion of what we have had so far ;-) – assylias Jan 15 '14 at 14:06
  • @MarkoTopolnik I've give you overall time being insignificant (assuming you're not doing a lot of these operations). But a ratio of almost 2:1 is not insignificant at all – Richard Tingle Jan 15 '14 at 14:06
  • You also generate less memory garbage which is significant on some systems (Android) with relatively poor garbage collection strategies. – Tim B Jan 15 '14 at 14:08
  • You may not be running at quite worst case here, since you are creating new `Integer` objects each time (once you're out of the cached autoboxing range). Adding many duplicates (something I have done in the past) then the difference may be even more pronounced – Richard Tingle Jan 15 '14 at 14:13
  • @RichardTingle If the *whole application* was running at 5/8 of the full potential speed, I would call that difference *barely significant*. However, if a super-fast subtask, a very minor part of the big picture, runs at 5/8 of its potential speed, then I call that *absolutely insignificant*. – Marko Topolnik Jan 15 '14 at 14:13
  • @RichardTingle Why would the difference be more pronounced? It could be *less pronounced*, even, because that's a constant in both approaches. Do you say that it is *faster* to allocate new objects? Because only that way would the ratio be more pronounced. – Marko Topolnik Jan 15 '14 at 14:14
  • 1
    The allocation of new Integers is constant time each time around the loop. That constant time reduces the ratio between the actual things that are being measured (not the absolute difference but the % difference). – Tim B Jan 15 '14 at 14:17
  • @MarkoTopolnik I mean if the new Integer() operation was free (which is impossible but run with it) the percentage difference between preallocated and default allocated would be larger. Using duplicates isn't free, but might be cheaper than a new `Integer`. Assuming that the Integer() part costs 1000 and all this messing about costs 1 or 2. Then (1000+1)/(1000+2) is very close to 1 (0+1)/(0+2) is much father from 1 – Richard Tingle Jan 15 '14 at 14:17
  • @RichardTingle Didn't entirely understand what you mean, but I have measured it instead, and the results are just as I expected: 105 vs. 70 µs---indicating more overhead with `new Integer()`, and a diminished ratio. – Marko Topolnik Jan 15 '14 at 14:21
  • @MarkoTopolnik `numbers.add(i);` has an implicit `new Integer()` because i is an `int`and the list contains `Integer`s. I was suggesting `Integer singleInteger=new Integer()` at the top and then adding `singleInteger` over and over again – Richard Tingle Jan 15 '14 at 14:25
  • @RichardTingle `numbers.add(1)` translates into `numbers.add(Integer.valueOf(1))`, and `Integer.valueOf(1)` translates into `Integer.IntegerCache[129]` :) – Marko Topolnik Jan 15 '14 at 14:28
  • @MarkoTopolnik Something like this: `Integer duplicate=5; numbers = new ArrayList(max); for (int i=0;i – Richard Tingle Jan 15 '14 at 14:33
  • @RichardTingle But that's exactly the same as I did. Why would you think it isn't? – Marko Topolnik Jan 15 '14 at 14:34
  • @MarkoTopolnik Running the above test case with this added I get: `Resizing: 43 Preall: 21 Duplicate entry: 8` but it feels dodge, altering the ordering changes the results – Richard Tingle Jan 15 '14 at 14:35
  • @MarkoTopolnik ahh, bloody `1`s looking like `i`s – Richard Tingle Jan 15 '14 at 14:36
  • @RichardTingle Actually, using a local var instead of calling `valueOf` each time did manage to shave another 5 µs off the times. Now I got 74 vs. 45 µs (default size vs. presized). I expected the array access to be hoisted out of the loop... – Marko Topolnik Jan 15 '14 at 14:42
5

OK, here's my jmh code:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
public class Comparison
{
  static final int size = 1_000;
  @GenerateMicroBenchmark
  public List<?> testSpecifiedSize() {
    final ArrayList<Integer> l = new ArrayList(size);
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }

  @GenerateMicroBenchmark
  public List<?> testDefaultSize() {
    final ArrayList<Integer> l = new ArrayList();
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }
}

My results for size = 10_000:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1       80.770        2.095  usec/op
testSpecifiedSize     avgt   1      9    1       50.060        1.078  usec/op

Results for size = 1_000:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1        6.208        0.131  usec/op
testSpecifiedSize     avgt   1      9    1        4.900        0.078  usec/op

My interpretation:

  • presizing has some edge on the default size;
  • the edge isn't that spectacular;
  • the absolute time spent on the task of adding to the list is quite insignificant.

My conclusion:

Add the initial size if that makes you feel warmer around the heart, but objectively speaking, your customer is highly unlikely to notice the difference.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • What if you add something more complex than just `Integer`? I suppose if you used object that takes some Mb you would see pauses from garbarge collector – ADS Jan 15 '14 at 14:21
  • 1
    @ADS That makes no difference, you are only adding references. – assylias Jan 15 '14 at 14:22
  • Its only worth worrying about if its a bottleneck point. In anything that is "real time" you have (1/60s) 16666us to do everything. A saving of 40us on a single ArrayList creation is huge (given that you'll do this many many times in a single tick). These were the exact kinds of thing I did to make my application run at 60 frames per second rather than 30 and thats the difference between it looking smooth and being a jaring mess – Richard Tingle Jan 15 '14 at 14:23
  • @ADS Not just what assylias said, but that would actually be a distraction from what is being measured here: the resizing of the ArrayList. – Marko Topolnik Jan 15 '14 at 14:23
  • @RichardTingle When I did my time-critical GUI animation code, I helped myself to *arrays*---that's what I call optimization :) – Marko Topolnik Jan 15 '14 at 14:25
  • @MarkoTopolnik If I'm honest so have I in most places, there are times when that is impracticle however – Richard Tingle Jan 15 '14 at 14:26
  • @MarkoTopolnik I'm annoyed to say that the numbers do prove your point and prove me somewhat (let's not be too self-inflicting) wrong... – assylias Jan 15 '14 at 14:37
  • +1 from me, in that I agree with your results (and they are reasonably close to mine). Not completely convinced with your conclusion. My policy has always been that I wouldn't go to any great lengths to calculate the size but when it's available I'll use it. – Tim B Jan 15 '14 at 14:43
  • 2
    @assylias, TimB I'd say we can write this off to a preference in style... the objective measurements do not support a rule which would mandate presizing as a relevant best practice, but at least they give you *something*. Therefore we have a minor code style issue on one side (a bit more code everywhere) and a minor performance issue on the other. It comes down to how much you value not having extra code in the constructor call. I for one value it more than the quite abstract advantage I may get from it. – Marko Topolnik Jan 15 '14 at 14:48
  • 1
    I have reran the test with `size = 810_326` (obviously cherry picked: the default arraylist gets resized at 810,325) and get 3.2ms vs 5ms. So you win even in that worst case. – assylias Jan 15 '14 at 14:50
  • @assylias Hats off to your cherry-picking! – Marko Topolnik Jan 15 '14 at 14:57
  • #jmh tag summoned me here. (*BENCHMARKING SEAL OF APPROVAL*) – Aleksey Shipilev Jan 15 '14 at 23:46
  • @MarkoTopolnik can you please explain what are those annotations? I am not familiar with them. – MaheshVarma Jan 16 '14 at 03:43
  • @AlekseyShipilev The tag was posted for that precise reason---and thanks for the seal :) – Marko Topolnik Jan 16 '14 at 08:04
  • @MaheshVarma They are `jmh`-specific and give configuration data to the tool. Most of them can be specified on the command line as well. – Marko Topolnik Jan 16 '14 at 08:08
4

All collections are auto-expanding. Not knowing the bounds will not affect their functionality (until you run into other issues such as using all available memory etc), it may however affect their performance.

With some collections. Most notably the ArrayList, auto expanding is expensive as the whole underlying array is copied; array lists are default sized at 10 and then double in size each time they get to their maximum. So, say you know your arraylist will contain 110 objects but do not give it a size, the following copies will happen

Copy 10 --> 20
Copy 20 --> 40
Copy 40 --> 80
Copy 80 --> 160

By telling the arraylist up front that it contains 110 items you skip these copies.

An educated guess is better than nothing

Even if you're wrong it doesn't matter. The collection will still autoexpand and you will still avoid some copies. The only way you can decrease performance is if your guess is far far too large: which will lead to too much memory being allocated to the collection

Community
  • 1
  • 1
Richard Tingle
  • 16,906
  • 5
  • 52
  • 77
  • I don't know what version of java you're refering to, but I think the default initial capacity of an `ArrayList` nowadays is ten (see [javadoc](http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html#ArrayList%28%29) ) – fabian Jan 15 '14 at 13:51
  • @fabian Interesting, edited, I've always thought it was 12. Weird. And disapointing, 12 sounds like they did some careful analysis and found that 12 was the perfect amount such that its multiples fitted typical data. 10 sounds like "10'll do" – Richard Tingle Jan 15 '14 at 13:54
  • Well since it's doubling each time the maximum difference for any initial value between 5 and 20 is 1 extra or fewer resizing...so it probably isn't worth trying to do anything super-clever really. – Tim B Jan 15 '14 at 14:00
0

In the rare cases when the size is well known (for example when filling a know number of elements into a new collection), it may be set for performance reasons.

Most often it's better to ommit it and use the default constructor instead, leading to simpler and better understandable code.

Peter Walser
  • 15,208
  • 4
  • 51
  • 78
  • 1
    If you've got a decent guess I don't see the downside of using it. Its fairly transparent (assuming you don't have a load of 'size estimating code') – Richard Tingle Jan 15 '14 at 13:56
0

For array-based collections re-sizing is a quite expensive operation. That's why pass exact size for ArrayList is a good idea.

If you set up size to a minimal size(MIN) and then add to the collection MIN+1 elements, then you got re-sizing. ArrayList() invokes ArrayList(10) so if MIN is big enough then you get some advantage. But the best way is to create ArrayList with expecting collection size.

But possibly you prefer LinkedList because it has no any costs for adding elements (although list.get(i) have O(i) cost)

ADS
  • 708
  • 3
  • 14