Why do dynamic arrays in C++ and Java have different initial capacities?

Question

So I've been searching for how actually dynamic array works in general. what i found is two different concepts.

In C++

In C++, dynamic array is generally implemented by vectors. Vectors set the capacity to 0, increases the count to insert new element and then doubles the capacity size for new insertions.

vector.h

/*
 * Implementation notes: Vector constructor and destructor
 * -------------------------------------------------------
 * The constructor allocates storage for the dynamic array and initializes
 * the other fields of the object.  The destructor frees the memory used
 * for the array.
 */

template <typename ValueType>
Vector<ValueType>::Vector() {
   count = capacity = 0;
   elements = NULL;
}

For expanding the vector size

/*
 * Implementation notes: expandCapacity
 * ------------------------------------
 * This function doubles the array capacity, copies the old elements into
 * the new array, and then frees the old one.
 */

template <typename ValueType>
void Vector<ValueType>::expandCapacity() {
   capacity = max(1, capacity * 2);
   ValueType *array = new ValueType[capacity];
   for (int i = 0; i < count; i++) {
      array[i] = elements[i];
   }
   if (elements != NULL) delete[] elements;
   elements = array;
}

In Java

In java, dynamic array is implemented using arrayList, They set the capacity to 10 (JVM based) and once the capacity is full they increases the capacity by some factor. Reason being for setting the capacity to 10 is that you don't have to initialize the memory frequently for every new insertion. Once the capacity is full increase the capacity size.

curiosity

Why implementation in vector.h sets the default value to 0?? Setting the capacity to some small value (lets say 10)instead of setting it to 0 may save the overhead of initializing memory every time user inserts some element.

As it is a dynamic array, setting small capacity for vector will do no harm because size of dynamic array generally goes beyond 10.

Edit : My question is why default 0 ? it could be any small value by default , because anyways vector is going to expand to some specific size , that's the purpose of using vector in first place.

@juanchopanza [this question](https://stackoverflow.com/questions/23414302/c-vector-initial-capacity) says all well-known implementations default to 0. — Barmar, Aug 08 '17 at 20:38
Vector grows with some factor too, it's not required to be 2x. — Andrei Damian, Aug 08 '17 at 21:47

MikeMB · Answer 1 · 2017-08-08T20:50:34.367

6

Having a capacity of zero by default has the advantage that default constructing an std::vector doesn't do any dynamic memory allocation at all (You don't pay for what you don't need). If you know you need ~10 elements, you can set the capacity explicitly via calling std::vector::reserve:

std::vector<int> vec;
vec.reserve(10);

I can only speculate, why Java does things differently, but afaik, dynamic memory allocation is cheaper in Java than in c++ and the two languages also follow different philosophies, when it comes to performance/low level control vs simplicity.

edited Aug 08 '17 at 20:50

answered Aug 08 '17 at 20:47

MikeMB

20,029
9
57
102

I'd get rid of the afaik comment. It adds little and will attract trolls. The rest looks correct, particularly the differing ideologies at play "Don't need, don't pay" vs "make it hard for the programmer to screw up." – user4581301 Aug 08 '17 at 20:50
But that doesn't add up with doubling capacity! – SergeyA Aug 08 '17 at 20:55
1

Or (if you need 10 elements) you can use the vector ctor that gives you that. – Jesper Juhl Aug 08 '17 at 20:55
@Jesper: If you do know the exact number at the time you create the vector, then yes. If the vector is supposed to hold e.g. all substring matches then you can't really do that. – MikeMB Aug 08 '17 at 21:01
@SergeyA: what doesn't add up? Also, I seem to remember, that the actual growth factor is not 2, but that is not really relevant here – MikeMB Aug 08 '17 at 21:02
2

In Java, objects are always reference types, so your default value (i.e. `null` as opposed to default-constructed) already has minimal overhead. – Jorn Vernee Aug 08 '17 at 21:03
@JornVernee: Good point, but there is afaik (it has been done time since I've done Java programming) there is a difference, between a default constructed (aka pry) dynamic array and a mull reference: You can e.g. directly add elements to the former or ask for its current size, but you can't do the same with the latter. – MikeMB Aug 08 '17 at 21:08
@user4581301: I've seen measurements that back up that statement (one reason being that the jvm can just use stack allocation by default), but I can't cite a good source from the top of my head. If someone wants to challenge that part of the statement I could probably find some via Google. – MikeMB Aug 08 '17 at 21:13
@MikeMB Yes, but it might be an argument for starting out with 10 elements. The assumption is that as soon as such an array is initialized, you want to start using it. But if you want to conserve space when not using it, you can always use `null`. – Jorn Vernee Aug 08 '17 at 21:23
@JornVernee Increasing the capacity of an empty c++ vector does not result in any objects being constructed. Just a block of memory getting allocated. – juanchopanza Aug 08 '17 at 21:24
@juanchopanza Sorry, I was talking about the vector itself being default constructed, as opposed to just having a `null` reference to an `ArrayList`. – Jorn Vernee Aug 08 '17 at 21:34
@Jorn: ok, I think I see your point. You could of course do the same in c++ by using a pointer, but it would cost you an (otherwise unnecessary) additional memory allocation. Of course you have the same cost in Java, but there you can't avoid it anyway, so it is a more reasonable option there. – MikeMB Aug 08 '17 at 21:49

einpoklum · Answer 2 · 2017-08-08T21:06:04.323

3

Why default 0?

It's not 0 by default, actually. That is, the C++ language standard does not define (AFAIK) what the initial capacity of a default-constructed vector should be.

In practice, most/all implementations default to 0-capacity. The reason, I would say, lies in one of the design principles of the C++ language is:

What you don't use, you don't pay for.

_{(see: B. Stroustrup: The Design and Evolution of C++. Addison Wesley, ISBN 0-201-54330-3. March 1994)}

And this is not just a trivial tautology - it's a slant of design considerations.

So, in C++, we would rather not pay anything for a vector which does not have any elements inserted into it, then save on potential size increases by making an initial "sacrifice".

As @yurikilocheck points out, however, the std::vector class does provide you with a reserve() method, so you can set an initial capacity yourself - and since the default constructor doesn't allocate anything, you will not have paid 'extra', for two allocations - just once. Again, you pay (essentially) the minimum possible.

Edit: On the other hand, std::vector partially breaks this principle in allocating more space than it actually needs when hitting its capacity. But at least it's not a superfluous allocation call.

edited Aug 08 '17 at 21:06

answered Aug 08 '17 at 20:51

einpoklum

118,144
57
340
684

Doubling capacity fails don't use, don't pay. – SergeyA Aug 08 '17 at 20:55
1

@SergeyA It kind of does, because it involves only one allocation. Most of the time it is cheaper than allocating only the required capacity. – juanchopanza Aug 08 '17 at 20:57
Cheaper, yes, but more memory hungry. Could make a difference between program running and program crashing for out of memory condition. – SergeyA Aug 08 '17 at 20:59
1

@SergeyA: No, it can't make that difference unless you've allocated your entire virtual memory space essentially. Remember that you're allocating virtual memory, not real memory. Also, after construction you can control capacity yourself and make sure not to just push back stuff, while wtih the default construction you're more vulnerable. – einpoklum Aug 08 '17 at 21:01
@SergeyA Yes indeed. It is a trade-of. The alternative would be to allocate and copy all the elements each time an item is pushed into the vector. Then you end up paying a high cost each time you do something quite trivial. – juanchopanza Aug 08 '17 at 21:01
@einpoklum not really. First, you might have swap disabled, and request no uncommitted memory to exist (which is kinda usual for my env). So argument about virtual memory doesn't stand. Second, what do you mean by not pushing back? You want to push back one element, not double the size! – SergeyA Aug 08 '17 at 21:08
@juanchopanza, I understand. I personally would prefer for vector to drop amortized constant insertion support, and instead give the users control over it by reserving properly. But apparently, users can't be trusted with control :) – SergeyA Aug 08 '17 at 21:09
@SergeyA: Well, now I have another comeback; see my edit. Also, if you request no "uncommitted" memory to exist - then pay attention to your vectors' capacity, or just don't use auto-allocating containers. Or just catch exceptions, and then your program won't crash. – einpoklum Aug 08 '17 at 21:09
@einpoklum, that's what you end up doing - not using vectors. Catching exceptions doesn't help, you still can't insert the element! Crash or no crash. Btw, I have dealt with implementation which uses 32 as default vector size with no elements. Killed my program. – SergeyA Aug 08 '17 at 21:10
@SergeyA: Beyond a certain point you give up on all of the standard library containers. `std::vector` is just more resilient than the rest since it's pretty straightforward. But some companies have a small-vector replacement where the vector starts out on the stack up to a certain size and then goes to the heap. – einpoklum Aug 08 '17 at 21:12

SergeyA · Answer 3 · 2017-08-08T21:26:28.723

I have used an implementation which reserves a default value of 32 elements per vector. It was native Sun C++ STL implementation. It was a disaster. I had a perfectly reasonable program which by it's nature had to have somewhat about hundreds of thousands of those vectors. It was running out of memory. And it should have been runinng fine, since of those hundreds of thousands of elements only small percentage had those vectors non-empty.

So from personal experience, 0 is the best default vector size.

Why do dynamic arrays in C++ and Java have different initial capacities?

3 Answers3