14

Is there a compiler optimization for the size() methods of Collections in Java?

Consider the following code:

for(int i=0;i<list.size();i++)
      ...some operation.....

There is a call to the size() methods for every i. Won't it be better to find out the size and reuse it? (Method calls have overheads).

final int len = list.size()
for(int i=0;i<len;i++)
      ...some operation.....

However, when I timed both these code pieces there was no significant time difference, even for i as high as 10000000. Am I missing something here?

Update1: I understand that the size is not computed again unless the collection changes. But there has to be some overhead associated with a method call. Is it the case that the compiler always inlines these (See Esko's answer)?

Update 2: My curiosity has been fueled further. From the answers given, I see that good JIT compilers will often inline this function call. But they will still have to determine whether the collection was modified or not. I am not accepting an answer in the hope that someone will give me pointers regarding how this is handled by compilers.

Cœur
  • 37,241
  • 25
  • 195
  • 267
athena
  • 5,579
  • 8
  • 30
  • 31
  • 1
    It is better not to worry about things like this until a profiler shows you that this is an actual bottleneck of your application which will probably never be the case. It is better to have more readable code than negligibly faster code. But from purely academic point of view, this is still a good question. – Sergei Tachenov Dec 14 '10 at 12:49
  • @Sergey: Yes. The simple test I performed showed me that I should not worry about the efficiency. Hence, the update. But this has fueled my curiosity. Please see my reply on Tom Anderson's comment. – athena Dec 15 '10 at 07:53

4 Answers4

18

Okay, here is an excerpt from the JDK sources (src.zip in the JDK folder):

public int size() {
    return size;
}

This is from ArrayList, but I think other collections have similar implementations. Now if we imagine that the compiler inlines the size() call (which would make perfect sense), your loop turns into this:

for(int i=0;i<list.size;i++)
// ...

(Well, let's forget that the size is private.) How does compiler checks if the collection was modified? The answer that it doesn't and doesn't need to do so because the size is already available in the field, so all it has to do is to access the size field on each iteration, but accessing an int variable is a very fast operation. Note that it probably calculates its address once, so it doesn't even have to dereference list on each iteration.

What happens when the collection is modified, say, by the add() method?

public boolean add(E e) {
    ensureCapacity(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

As you can see, it just increases the size field. So the compiler doesn't actually need to do anything to ensure it has access to the latest size. The only exception would be that if you modify the collection from another thread you need to synchronize, otherwise the loop thread may see its local cached value of size which may or may not be updated.

Sergei Tachenov
  • 24,345
  • 8
  • 57
  • 73
10

The value returned by collection's .size() method is usually cached and recalculated only when the actual collection is modified (new elements are added or old ones removed).

Instead of comparing for loop control scoping, try using the for each loop since that actually uses Iterator which in some collection implementations is a lot faster than iterating by using index.

Esko
  • 29,022
  • 11
  • 55
  • 82
  • 2
    For instance when using a LinkedList. – Guillaume Dec 14 '10 at 11:47
  • @Esko: By caching do you mean the field "size" in the subclasses like ArrayList? That is still a method call, isn't it? Or is it the case, that method calls do not have much overhead in java? – athena Dec 14 '10 at 11:49
  • @athena: Well yes, the value of the field is recalculated only on demand. JVM optimizes method invocations away over time by inlining the actual field access so technically especially in long running applications most method calls do not have any overhead at all. – Esko Dec 14 '10 at 12:02
  • @Esko: This is aside from the question, but could you give an example of a collection where using Iterator is faster than iterating by an index (apart from LinkedList)? – athena Dec 14 '10 at 12:04
  • 1
    @Esko: Thanks. Could you give me some pointers to articles / documentation on such inlining of fields? – athena Dec 14 '10 at 12:08
0

Calling the size() method of a collection is just returning an integer value that is already kept track of. There isnt much of a time difference because size() isnt actually counting the number of items but instead the number of items are kept track of when you add or remove them.

prgmast3r
  • 413
  • 5
  • 8
  • Yes and no. There is no requirement for a `Collection` to return it's own size in O(1). But most implementations do it. – Andreas Dolk Dec 14 '10 at 11:48
0

The java language specification explains, that the expression is evaluated on each iteration step. With you example, list.size() is called 10.000.000 times.

This doesn't matter in your case, because list implementations (usually) have a private attribute that stores the actual list size. But it may cause trouble, if the evaluation really takes time. In those cases it's advisable to store the result of the expression to a local variable.

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • That is still a method call? Or is it the case, that method calls do not have much overhead in java? – athena Dec 14 '10 at 12:12
  • @athena: A good JIT compiler - like the one in Sun's JVM - will very often be able to inline the method call, turning it into a simple load, which is about as fast as it gets. – Tom Anderson Dec 14 '10 at 12:45
  • @Tom: But the compiler will still have to determine whether the collection was modified. Could you give me some pointers as to finding out what (Sun's) JVM does? – athena Dec 15 '10 at 08:00
  • @athena - with inlining, the execution code may contain a reference to the (private) size field after optimisation. That's perfectly safe, if the method is a simple getter that returns the reference/value of a private field. Like most `size()` methods. – Andreas Dolk Dec 15 '10 at 10:17