3

Lately I've been writing a lot of code in C and I've noticed, that the thing that takes most time in my programs is resizing dynamic data structures. Let's say we have an array with characters and I want to append some characters to the end of this array. I do this that way:

  1. Check if there is enough memory allocated.
  2. If not, realloc array to array having twice the size (using realloc)
  3. If there is enough memory now, append the characters, else go to point 2.

I'm just wondering if this is efficient enough. For example I could think of something like a tree structure where I would hold the arrays in nodes of the tree, and that way instead of copying old elements to new array before appending something, I would just add newly malloc'ed elements to the next node of the tree and append the characters there. That way I would avoid unnecessary copying...

So that's just one way of doing that resizing differently. Should I look for something else or is my solution of just copying old elements to new array twice the size efficient enough?

qiubit
  • 4,708
  • 6
  • 23
  • 37
  • 2
    Questions seeking opinion-based answers are inappropriate here so I've voted to close the question. One important thing for you to consider though -- how many times do you think you might need to double the memory (and how much memory will you have available to continue to allocate)? Will you reach a point where doubling just to add one more entry is far too expensive in system resources? If you can reasonably use a linked list / tree / etc., rather than a single large buffer, you probably should. – mah Jun 04 '15 at 16:23
  • 1
    In general, sophisticated trees are never worth the time spent on implementing it. Realloc of 1.5 is just fine. – user3125367 Jun 04 '15 at 17:27
  • 1
    @user3125367 Yeah, one should really use a language which provides them already. C isn't one of them. – Peter - Reinstate Monica Jun 04 '15 at 17:33
  • 1
    @PeterSchneider Your comment seems a little sarcastic (that's ok). Actually, sophisticated trees are hard in any language, unless implementation is hidden under some interface. And then you experience that interface's overhead, because mapping indexes and iterators to "trivial userland" became non-trivial. One can take any of zillions of adt libraries from sourceforge etc. and use it, so C *does* provide tools. The worth of complexity is still a big question though. – user3125367 Jun 05 '15 at 13:52
  • @user3125367 Well, I have only recently taken up serious programming in C# at my job and I am -- against my prejudice -- impressed with what the language (and VS IDE) offers and how easy it makes many things. But then C#'s collections do not contain a tree, at least not explicitly, so I'm glad I didn't say C# in my first comment ;-). An explicit tree may be less useful than one thinks, cf. http://stackoverflow.com/questions/205945/why-does-the-c-stl-not-provide-any-tree-containers (discussing C++, also without trees). – Peter - Reinstate Monica Jun 06 '15 at 17:35
  • @PeterA.Schneider The tricky thing for me is that you can often outperform the trees in many languages without getting so sophisticated nowadays by just writing very cache-friendly code, like a much simpler but cache-friendly hash table for searching as opposed to sophisticated red-black tree implementation, even if the hash table has its share of collisions and performs more steps on average than the RB tree. It's one of the reasons I'm writing more data structures again in C these days as opposed to C++ -- I'm finding I get better performance focusing on memory layouts [...] –  Jan 01 '18 at 09:11
  • [...] and where individual bits and bytes are stored and accessed exactly these days more than using the most sophisticated algorithms. C makes the former a bit easier and more direct to do than C++, while I find C++ a lot easier when I genuinely want a sophisticated data structure. Of course I'm not talking about data structures that require linear-time complexity searching or anything like that... but generally data structures of a much simpler but cache-friendly variety. –  Jan 01 '18 at 09:12
  • @PeterA.Schneider As an example,I actually jumpstarted my career in the VFX industry by creating a somewhat complex n-ary adjacency tree to accelerate collision detection for soft bodies (meshes that deform every frame). That was back in the early 90s. But nowadays I'm finding I can beat my old data structure by just writing a stupid multi-dimensional hash table in C. That wasn't the case back when I started because the performance differences between registers, CPU cache, and DRAM weren't so epic as they are these days. Now dumber algorithms are winning provided they're cache-friendly. –  Jan 01 '18 at 09:17
  • As a result, almost paradoxically, I'm finding more reasons to write code in C to stay competitive these days than 25 years ago as well as low-level GPU code along with SIMD instrinsics and sometimes even assembly. That said I do work in performance-critical areas related to mesh processing and raytracing... but it's funny -- with all the hardware advancements, I'm finding more of a need to write low-level code, not less, due to the changing characteristics of the hardware (it's not becoming faster for everything; it's becoming faster for very specific use cases and memory access patterns). –  Jan 01 '18 at 09:22
  • 1
    @DrunkCoder Interesting insights, thanks. – Peter - Reinstate Monica Jan 02 '18 at 14:21

5 Answers5

6

As with all "most efficient" questions, the only way to know is to measure the actual performance in real-life conditions.

Generally speaking, there can be big performance gains from having items in consecutive memory locations, as that's almost always the most cache-friendly layout for any algorithm that's processing the items one after the other. Overall, a consecutive array uses less space, as well, simply because you don't need extra space to store additional data (like tree nodes).

While doubling the size when you need to grow is a common approach, it can tend to eat up address space. Some STL implementations grow structures like vectors by 1.5x instead of 2x to avoid certain pathological cases. In the short term, this leads to more allocations but can lead to less memory fragmentation. That may or may not apply if you're using realloc. It depends how often realloc succeeds in doing it in place.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • 2
    I've also read about 1.5x being less fragmenting some time ago, but can't find the link. So it seems to be not so opinion-based as op-commenter said. Note to OP: if you're actively resizing and also plan shrinking on remove (i.e. not only grow), avoid using the same factor. E.g. grow by 1.5x, but shrink only on 0.5x. This will prevent realloc jitter on near-to-border add/remove cases. – user3125367 Jun 04 '15 at 17:21
2

Usually anything that has logarithmic time complexity (like your reallocation algorithm) is considered "good enough". In general you would only save the copy operation with a list of chunks, but you would still have to access the free store for the new chunks which is always comparatively costly.

Your best bet is probably to perform reasonable heuristics for pre-allocation so that you avoid re-allocs altogether in most cases. Today memory is plentiful, at least if the others show some disciplin ;-).

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
1

After accept answer

Efficiency depends on many issue including and profiling will provide an idea for a given siltation.

Certainly growing by a scale factor like 2.0, 1.5, or something in the neighbor provides a nice logarithmic growth.

2 issue not cover elsewhere: shrink and optimistic allocation.

Shrink: Not only do structures grow, they may shrink. Like growth, the reduction should be logarithmic. Yet with reduction can occur thrashing if the threshold of grow/reduction are similar. Recommend that the growth and shrink thresholds are logarithmically evenly spaced. Example: 4x steps,

Grow on size 1, 4, 16, 64, 256 ...
Shrink on 2, 8, 32, 128 ...

Optimistic allocation: Processors with large memory capacity use virtual address spaces often far larger than the physical address space. When allocating memory, say of 1,000,000, no true physical memory is allocated until it is used - (e.g. set to non-zero). Even then, it is only used in chunks of some page size like 4k or 64k bytes. See Why is malloc not "using up" the memory on my computer?

Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

The basic idea of growing allocations by a factor of two is solid, because it makes average reallocation work linear to the maximum array size (every object is moved once on average due to the fact that half the objects are never moved).

The only optimizable point I see in your implementation is that you reallocate several times when the size of the array has to jump: If you have an array of 64 bytes and need 4096 bytes, your realloc() calls first copy 64 bytes, then 128 bytes (64 of which are uninitialized), then 256 bytes, and so on, copying 4032 bytes in total, doing six allocations on the way. Yet all that's needed is to grab memory for 4096 bytes and copy over the 64 bytes. Write your code in a way that you maximally call realloc() once when you grow your allocation.

Another point that can kill performance is a too small starting size. A malloc() call takes more than 250 CPU cycles, and an allocation has an overhead of at least two pointer sized values. You want to make sure that these costs are small relative to your use of the memory. As such, there is usually no point in using an initial allocation for only one element. You want to make sure that the initial allocation is at least in the range of 64 bytes (twenty-five percent space overhead).

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
-2

Reallocing the array is definitely not most efficient. Realloc ensures contiguous block of memory. Since that may not be possible, realloc may end up allocating new block of memory, copy all of your data to it, and releasing an original block of memory. This can be very expensive, as it is a liner operation in terms of size of your data, with a pretty large constant overhead.

You should really evaluate if you need contiguous block of memory (a tree structure you are describing is an example of non contiguous data structure). If you do, you should try to come up with a technique to estimate how much memory you will need, and minimize reallocing.

It seems to me that you can store your data in an array of pointers to other character arrays. Since top level array stores only pointers, you can pre-allocate it large enough for all practical uses. I don't know enough about your problem to recommend a proper solution, though.

quarterdome
  • 1,559
  • 2
  • 12
  • 15