3

Say I'm adding N items to an ArrayList in Java. What is the worst-case run time for this? I know it can be O(N) to add a single item because the array may have to resize. It won't resize N times as I add N items or even a factor of N because (AFAIK) the ArrayList increases in capacity by some factor each time it resizes. That would mean some kind of log(N) number of resizes. So it seems like it should be O(N log(N)) to insert N items, but I'm not completely sure about this. An old computer science exam I'm looking at had the answer as O(N^2). Am I missing anything?

int newCapacity = (oldCapacity * 3)/2 + 1; (from this answer)

Community
  • 1
  • 1
sudo
  • 5,604
  • 5
  • 40
  • 78
  • O(N^2) is the cost bound if each resize increases array capacity by the same amount. Perhaps that's what the exam question was asking about. – John Bollinger May 09 '15 at 19:17
  • Actually, I found out that the exam question meant insertion into any position. That includes the front, where it would need to shift everything down. That's O(N^2). – sudo May 09 '15 at 20:06

1 Answers1

5

The dynamic array is well-studied in computer science in amortized time analysis. The short answer is that when starting from an empty dynamic array and adding N elements, the total time is O(N).

You are correct that adding a single item has the worst-case time of O(N) when a resize must be performed, and that O(log N) resizes take place.

But when we add up these resize operations, the total is only O(N), which is very good. Here's an example to illustrate when the scaling factor is 2 (instead of ArrayList's scaling factor of 3/2):

N = 64: Resize to 1, 2, 4, 8, 16, 32, 64. Total operations = 127 (roughly 2N).

Nayuki
  • 17,911
  • 6
  • 53
  • 80
  • Shouldn't an O(N) operation performed log(N) times result in O(N log(N)), like how it is with adding to a TreeMap? I know adding N items to a balanced tree (not making something really spindly) is considered O(N log N). – sudo May 09 '15 at 18:46
  • Technically yes, but N log N is a *pessimistic* analysis. O(N) is the tight analysis. – Nayuki May 09 '15 at 18:48
  • The key point is that the *N* is different at each resize, and adding them up gives about 2 *N*. – Nayuki May 09 '15 at 18:49
  • 1
    Think of it this way, after 32 insertions you reallocate the storage and copy 32 elements; after 64 insertions you do it for 64 elements. When you double the size of your contaier, you spend twice as long for reallocations but you also have to do it only half as often. – cababunga May 09 '15 at 18:52
  • I see what you mean. I'm going to try this on paper real quick to make sure I believe it. – sudo May 09 '15 at 18:56
  • 1
    Yeah, I can prove by induction that it's roughly 2N operations for any N. I'm assuming it'll be f*N in general, where f is the array resize factor. – sudo May 09 '15 at 19:16
  • 1
    Close, but it's actually f / (f - 1) * N due to the geometric series. =) Bigger factors have less time overhead but waste more space. – Nayuki May 09 '15 at 19:35