3

I have format string and I'm parsing it than replacing format specifiers with input arguments. Now I consider how to allocate memory for such result string after substitution of arguments. I could allocate this string as long as format string, but than substitution of other string in the place of %s of any long will need to reallocate this string in some indeterministic fusion, which making it necessary to do some inelegant calculations in the code. So I thought I could allocate this string created from format string just char by char, reallocating it each time like:

/*** for loop traversing next chars in format string ***/
// if new char 
str = realloc(str, sizeof(*str) +1); 
// if %s 
str = realloc(str, sizeof(*str) + strlen(in_str)); 
// if %d 
str = realloc(str, sizeof(*str) + strlen(d_str)); 
Michał Ziobro
  • 10,759
  • 11
  • 88
  • 143
  • 6
    These statements seem incorrect: `sizeof(*str)` is **not** the size of the string allocated so far. – chqrlie Aug 14 '16 at 10:08
  • You could create a lightweight String class (for C, a `struct`) which holds a current `max_length` (which initially should be "roughly enough for most cases"). It would add about the same overhead, though, because you'd always need to check against the max length *before* adding to it. – Jongware Aug 14 '16 at 10:08
  • 4
    the pattern `str = realloc(str, ` doesn't allow for recovery from out-of-memory – M.M Aug 14 '16 at 10:31
  • This is a bad idea (apart from the error already hinted at). If you have an idea on string size (like "it's never going to be longer than 160, 200, 300 chars") you'd be better off allocating a static buffer and not constantly bothering the OS with memory allocation calls. Depending on the system, the memory waste could be much easier tolerated than the OS overhead. – tofro Aug 14 '16 at 12:15
  • 2
    Are you having a performance problem that can be fixed by not reallocating strings so much? If so, it's bad and you can fix it. If you're not having performance problems, don't worry about how many times you call `realloc()`. – Andrew Henle Aug 14 '16 at 12:26

3 Answers3

6

Usually internal library code which deals with length of mutable strings/array/lists/whatever else does it in 2^n steps - i.e. when you have 4 bytes of memory and need to allocate 5, it actually allocates 8. This will reduce number of realloc() calls, which is expensive, to ~log(n) operations. However there could be other optimisations, depending on library.

Nickolay Olshevsky
  • 13,706
  • 1
  • 34
  • 48
4

I won't comment about issues with the code that are adequately addressed in other answers.

Calling realloc for every individual act of extension of string isn't necessarily a poor practice. It looks as if it might perform badly, and to fix that, you could implement a scheme which grows the string in larger increments, less frequently. However, how do you know that realloc doesn't already do something of that sort internally? For instance you might think you're being clever by growing a 128 byte string to 256 bytes and then to 512, rather than one character at a time. But if those happen to be the only available malloc internal block sizes, then realloc cannot help but also step through the same sizes. Okay, what about the saving in the raw number of realloc calls that are made? But those are just replaced by invocations of the more clever string growing logic.

If the performance of this format-string-building loop matters, then profile it. Make a version that reduces the realloc operations and profile that, too. Compare it on all the target platforms you write for and on which the performance matters.

If the performance isn't critical, then optimize for properties like good structure, maintainability and code reuse. A function which builds a format string doesn't have to know about string memory management. It needs an abstract interface for working with dynamic strings. That interface can provide a nice function for changing a string in-place by adding a copy of another string at its tail. Functions other than just the format-string producer can use these string operations.

The string management code can decide, in one place, whether it calls realloc every time the length of a string changes, or whether it tracks the storage size separately from the length, and consequently reduces the number of realloc calls.

Kaz
  • 55,781
  • 9
  • 100
  • 149
3

Code like this:

str = realloc(str, sizeof(*str) +1);

is bad. If realloc fails it will return NULL but it will not free(str). In other words - memory leak. You need to assign the result of realloc to another pointer, then check for NULL and act accordingly.

Whether or not it is good or bad practice to use many realloc depends on what you are trying to obtain, i.e. performance, maintainability, clearness, etc. The best advice is: Write the code as you like it to be. Then profile it. Performance issues? No -> Be happy. Yes -> Rewrite the code with focus on performance.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • 3
    *... `str = realloc(str, sizeof(*str) +1);` is bad.* It's only bad if the code makes an attempt to recover from a failed allocation. If the response to a failed allocation is to `exit()` the process , the leak is irrelevant. – Andrew Henle Aug 14 '16 at 12:29
  • @AndrewHenle In addition to Andrew's comment, if the code crashes because of the null pointer, likely the case, the leak is also irrelevant. – Kaz Aug 14 '16 at 12:30
  • @Kaz - That is not guaranteed. On most modern systems everything will be fine but as a general piece of c-code, it is bad. See: http://stackoverflow.com/questions/2213627/when-you-exit-a-c-application-is-the-malloc-ed-memory-automatically-freed – Support Ukraine Aug 14 '16 at 12:44
  • @4386427 (Also not guaranteed on non-modern systems is that dereferencing a null pointer will crash). – Kaz Aug 14 '16 at 12:47
  • @Kaz Crashing from a `null` pointer dereference after `malloc()` runs out of memory is likely to result in the creation of a rather large core file on some OSes. That can have adverse consequences. – Andrew Henle Aug 14 '16 at 15:11
  • @AndrewHenle That tends to be disabled by default nowadays, or severely curtailed with `ulimit`. – Kaz Aug 14 '16 at 16:55