3

When numbers are smaller, it's quick to grow the size of an array list from 2 to 4 memory addresses but when it starts to increase the amount of space closer to the max amount of space allowed in an array list (close to the 2MB limit). Would changing how much space is allotted in those bigger areas be more efficient if it was only growing the size of the array by a fraction of the size it needs at some point? Obviously growing the size from 1mb to 2mb isn't really a big deal now-days HOWEVER, if you had 50,000 people running something per hour that did this doubling the size of an array, I'm curious if that would be a good enough reason to alter how this works. Not to mention cut down on un-needed memory space (in theory).

A small graphical representation of what I mean.. ArrayList a has 4 elements in it and that is it's current max size at the moment

||||

Now lets add another item to the arraylist, the internal code will double the size of the array even though we're only adding one thing to the array. The arraylist now becomes 8 elements large

||||||||

At these size levels, I doubt it makes any difference but when you're allocating 1mb up to 2mb everytime someone is doing something like adding some file into an arraylist or something that is around 1.25mb, there's .75mb of un-needed space allocated.

To give you more of an idea of the code that is currently ran in c# by the System.Collections.Generic class. The way it works now is it doubles the size of an array list (read array), every time a user tries to add something to an array that is too small. Doubling the size is a good solution and makes sense, until you're essentially growing it far bigger than you technically need it to be.

Here's the source for this particular part of the class:

private void EnsureCapacity(int min)
{
  if (this._items.Length >= min)
    return;
                                          // This is what I'm refering to
  int num = this._items.Length == 0 ? 4 : this._items.Length * 2;  
  if ((uint) num > 2146435071U)
    num = 2146435071;
  if (num < min)
    num = min;
  this.Capacity = num;
}

I'm going to guess that this is how memory management is handled in many programming languages so this has probably been considered many times before, just wondering if this is a kind of efficiency saver that could save system resources by a large amount on a massive scale.

Codezilla
  • 153
  • 1
  • 9
  • 2
    If you have a situation where the performance matters, then you can pre-size your list with `Capacity`. Otherwise, in nearly all situations, if you're growing your list all the way to some big N, you're likely to grow quite a bit more -- which is why doubling usually works out nicely. It also gives amortized asymptotic complexity of `O(1)` for each insert. – Cameron Jul 18 '14 at 19:07
  • What are you actually asking? Have you experienced any issues with this? And what is "less efficient", efficient in regards to what exactly? – Patrick Jul 18 '14 at 19:09
  • 2
    [Related question](http://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array). – Cameron Jul 18 '14 at 19:12
  • @Patrick It was just theoretical, me thinking too much :) In regards to wasted memory space but Servy mentioned something that makes sense in that you exchange memory space for computational speed because of how memory addressing works I think. – Codezilla Jul 18 '14 at 19:13
  • 2
    @Codezilla No, it's not about how memory addressing works, in this case it's because using more memory means you need to copy the data less often. Copying data takes time, not memory. – Servy Jul 18 '14 at 19:18
  • @Servy Aw that makes sense. – Codezilla Jul 18 '14 at 19:19

2 Answers2

9

As the size of the collection gets larger, so does the cost of creating a new buffer as you need to copy over all of the existing elements. The fact that the number of these copies that need to be done is indirectly proportional to the expense of each copy is exactly why the amortized cost of adding items to a List is O(1). If the size of the buffer increases linearly, then the amortized cost of adding an item to a List actually becomes O(n).

You save on memory, allowing the "wasted" memory to go from being O(n) to being O(1). As with virtually all performance/algorithm decisions, we're once again faced with the quintessential decision of exchanging memory for speed. We can save on memory and have slower adding speeds (because of more copying) or we can use more memory to get faster additions. Of course there is no one universally right answer. Some people really would prefer to have a slower addition speed in exchange for less wasted memory. The particular resource that is going to run out first is going to vary based on the program, the system that it's running on, and so forth. Those people in the situation where the memory is the scarcer resource may not be able to use List, which is designed to be as wildly applicable as possible, even though it can't be universally the best option.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Pretty good, although I guess I'd add that you could use a LinkedList and have O(1) additions and save on memory, at the cost of O(n) traversals/access. – Casey Jul 18 '14 at 19:11
  • 1
    @emodendroket `LinkedList` has an overhead *for each item in the list*. That overhead can easily be 2-3x. That would mean that a link list's best case memory overhead is likely to be more than the *worst case* overhead of `List`. Each node in a linked list needs to have a reference to the data in question (the one piece of non-overhead) a link to the next node, and the type data for the object itself. And if it's a doubly linked list, it needs a previous reference as well. – Servy Jul 18 '14 at 19:14
  • Well, OK, fair enough, but that depends on the size of the objects we're putting in the collection, doesn't it? – Casey Jul 18 '14 at 19:15
  • 1
    @emodendroket Technically, yes, but in virtually all situations the objects are going to be word sized objects. If the object is larger than a word in size it should almost certainly be converted into a reference type instead. Having very large value types is generally a really bad idea. However if you did have a really large value type then yes, that's one of the rare situations where a `LinkedList` wouldn't have a large percentage of wasted memory. – Servy Jul 18 '14 at 19:17
5

The idea behind the exponential growth factor for dynamic arrays such as List<T> is that:

  1. The amount of wasted space is always merely proportional to the amount of data in the array. Thus you are never wasting resources on a more massive scale than you are properly using.

  2. Even with many, many reallocations, the total potential time spent copying while creating an array of size N is O(N) -- or O(1) for a single element.

  3. Access time is extremely fast at O(1) with a small coefficient.

This makes List<T> very appropriate for arrays of, say, in-memory tables of references to database objects, for which near-instant access is required but the array elements themselves are small.

Conversely, linear growth of dynamic arrays can result in n-squared memory wastage. This happens in the following situation:

  1. You add something to the array, expanding it to size N for large N, freeing the previous memory block (possibly quite large) of size N-K for small K.

  2. You allocate a few objects. The memory manager puts some in the large memory block just vacated, because why not?

  3. You add something else to the array, expanding it to size N+K for some small K. Because the previously freed memory block now is sparsely occupied, the memory manager does not have a large enough contiguous free memory block and must request more virtual memory from the OS.

  4. Thus virtual memory committed grows quadratically despite the measured size of objects created growing linearly.

This isn't a theoretical possibility. I actually had to fix an n-squared memory leak that arose because somebody had manually coded a linearly-growing dynamic array of integers. The fix was to throw away the manual code and use the library of geometrically-growing arrays that had been created for that purpose.

That being said, I also have seen problems with the exponential reallocation of List<T> (as well as the similarly-growing memory buffer in Dictionary<TKey,TValue>) in 32-bit processes when the total memory required needs to grow past 128 MB. In this case the List or Dictionary will frequently be unable to allocate a 256 MB contiguous range of memory even if there is more than sufficient virtual address space left. The application will then report an out-of-memory error to the user. In my case, customers complained about this since Task Manager was reporting that VM use never went over, say, 1.5GB. If I were Microsoft I would damp the growth of 'List' (and the similar memory buffer in Dictionary) to 1% of total virtual address space.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • 1
    Sounds like it's almost a bad idea to try and tweak expected memory usage in some cases. Good to know. – Codezilla Jul 18 '14 at 21:19