4

What is the big O time complexity of allocating an array in .Net?

I'm guessing that if the array is small enough to fit on the ephemeral segment it should be O(1), but that as n gets larger it gets more difficult to find enough memory so it may change.

Also the large object heap may be fragmented, so if n is larger enough for the array to fit on the LOH, it probably won't be O(1).

Yair Halberstadt
  • 5,733
  • 28
  • 60
  • "but that as n gets larger it gets more difficult to find enough memory so it may change." But the entirety of free memory is in one contiguous block. You seem to be thinking of the C++ memory model where objects don't move in memory by the GC. The LOH isn't compacted, unlike the small object heap, so the statement would be applicable to it. – Servy Dec 04 '18 at 21:51
  • Does the LOH use malloc style memory management? – Yair Halberstadt Dec 04 '18 at 21:58
  • It doesn't compact the memory. I'm not sure how similar they are beyond that (or even how different various C/C++ implementations tend to be, for that matter). – Servy Dec 04 '18 at 22:04
  • O(1) where the cost of the 1 operation may vary wildly is really the best you can usefully do with any remotely typical allocation – zzxyz Dec 04 '18 at 22:05
  • 1
    Well, an array needs to be zero-initialized. I suppose that gets you O(n) as well. – IS4 Dec 04 '18 at 22:15
  • The GC preinitialises everything to 0 when it asks for a new segment @IllidanS4 – Yair Halberstadt Dec 04 '18 at 22:17
  • And even if it didn't a large memset to 0 is still likely to be cheaper than the rest of the allocation mechanisms. – zzxyz Dec 04 '18 at 22:23

2 Answers2

4

A new array will be allocated in two distinct heaps, as most are probably aware, depending on its size (and the size threshold is 85000 bytes):

  • Small Object Heap - allocation here happens in so-called allocation context which is pre-zeroed region of memory, located inside an ephemeral segment. Two scenarios may happen here:
    • there is enough space for a new array in current allocation context - in such case we can treat it as O(1) operation of just returning an address for the array (and bumping pointer for the next objects)
    • there is not enough space there - allocation context will be tried to be enlarged by an allocation quantum (usually around 8kB) if it is possible (like it lies at the end of the ephemeral segment). Here we hit the cost of zeroing those 8kB so it is significantly bigger. Even worse, allocation context may not be possible to enlarge - because it may lie between already allocated objects. In such case a new allocation context will be created - somewhere inside the ephemeral segment with the help of free-list, to make use of the fragmentation. In this case, the cost is even bigger - traversing free-list to find the proper place and then zeroing it. Still, the cost does not depend on the array size directly and is "constant" so we can treat it as O(1) like previously.
  • Large Object Heap - because allocations here are by default much less frequent, it uses "ad-hoc" allocation contexts - each time allocation happens here, GC searches for the appropriate place with the help of the free-list and zeros it. Again, both cost of free-list traversal and memory zeroing happens here, but as objects are big here, it is mainly predominated by zeroing cost. Here we can talk about O(n) cost.

In case of LOH allocation one should be aware of an additional hidden "cost" - such allocations do not happen during some parts of Background GCs (because both operate on free-list). So if it happens that you have a lot of long Background GCs, LOH allocations will be paused waiting for GCs to end. This obviously will introduce unwanted delays for your threads.

Konrad Kokosa
  • 16,563
  • 2
  • 36
  • 58
  • "because it may lie between already allocated objects" - that could only happen if there are pinned objects, right? Otherwise it should be compacted. – Thomas Weller Dec 05 '18 at 12:10
  • Do you happen to have a good qualified/official link to the terms (e.g. allocation quantum) and numbers (8k) you give? Maybe also for the data structure that is used internally for the free list etc.? I didn't find good ones. – Thomas Weller Dec 05 '18 at 12:16
  • In general SOH generations are sometimes just swept, without compaction. In such case, the allocation context may reuse free space (fragmentation). Agree - it may happen rarely as gen0/1 is compacted often. And pinning always may "help" in fragmentation. – Konrad Kokosa Dec 05 '18 at 12:18
  • Official link would be the [BOTR](https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md) but it explains it only in-general. You will find much more details in my Pro .NET Memory Management book, reviewed by Maoni so can be treated as semi-official;) – Konrad Kokosa Dec 05 '18 at 12:21
2

Objects in the ephemeral segment (SOH; small object heap) are allocated after the last known object on that segment. It should really be just a pointer to there.

The "empty" space in between will not be considered, since there is no empty space. Even if the object has no reference any more, it will still be there until it's garbage collected. Then, the SOH will be compacted, so again, there are no free spaces.

If the SOH is not large enough, then a different one has to be chosen or a new segment has to be created. This will take longer, but is still O(1).

The LOH is a bit more complex, since it will usually not be compacted. There are websites that give statements that the LOH has a "free list". But I'm not sure whether it's really a list style implementation. I guess that it has a better management and works like a dictionary, so it should not be worse than O(log(n)).

What needs to be done?

  • perhaps get new memory from the kernel. If so, the memory was already zeroed and memset() is not needed.
  • if that new memory is not available in RAM, swap something to disk first. This part may become really expensive but unpredictable.
  • If memory is already available in .NET, it might need to be initialized to zero. But the implementation of memset() is optimized (e.g. using rep stos)
  • Initialize the array with values from somewhere (e.g. a file). This will likely be a .NET loop and except from swapping one of the expensive parts.

Usually, I would not consider the allocation of memory something to worry about, unless you have used a profiler (like dotMemory) that told you about memory throughput issues. Trust Donald Knuth: "premature optimization is the root of all evil".

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
  • Thanks. That sounds like it makes sense. In general is the cost of allocating an array going to be trivial compared to initialising it with values, even on the LOH? – Yair Halberstadt Dec 04 '18 at 22:16
  • 1
    Thanks. The actual context is that Im writing a method where worst case performance is terrible as a result of list copying, which I'm trying to optimise. Now I'm using a little bit of calculas to work out how best to maximise performance, but I'm assuming that copying a list is O(n). But then I thought to myself - that's for the copying. What is the cost of allocating the array in the first place? – Yair Halberstadt Dec 04 '18 at 22:36
  • @YairHalberstadt - When it comes to performance problems, the GC is far more likely to be the culprit than allocation. Doubly so on non-desktop platforms. – zzxyz Dec 04 '18 at 23:38
  • Hi @thomas-weller, you've mixed some things here and there. Like, for SOH "The "empty" space in between will not be considered, since there is no empty space." is not entirely true. I've provided some more detailed information in my answer. – Konrad Kokosa Dec 05 '18 at 10:38