3

What is best practice when dealing with functions which return malloc'd pointers to C-strings?

Here's an example:

FILE *f;
char *tmp;
for (i = 0; i <= 10; i++) {
    tmp = concat(fn, i);
    f = fopen(tmp, "w");
    free(tmp);
    // read f, do something useful ...
    fclose(f);
}

char* concat(char *a, int b) returns a pointer to a new C-string which contains the concatenation of a and b.

Not only do I have to specify a temporary pointer which is then passed to fopen, I also have to free(tmp) every time. I would rather prefer something like this:

FILE *f;
char *tmp;
for (i = 0; i <= 10; i++) {
    f = fopen(concat(fn, i), "w");
    // read f, do something useful ...
    fclose(f);
}

But this of course leads to memory leaks. So what is best practice here? Something like concat(char *a, int b, char *result) where result is expected to be a pre-allocated memory for the resulting C-string? This solution has its disadvantages like limited or not optimal size of result.

CiPoint
  • 55
  • 3
  • 1
    When a function returns a `malloc`ed pointer, the caller **must** be responsible for `free`ing it. – Youssef13 May 27 '20 at 11:17
  • @lurker `concat` can return a `NULL`, `fopen` can not use `NULL` as first parameter, furthermore there is no way to `free` the result of `concat` using this approach. – David Ranieri May 27 '20 at 11:23
  • As you have noticed yourself, the second version shouldn't be used because it leaks memory. However, whether you allocate memory inside of `concat` or pass a pre-allocated buffer to it, is pretty much just a matter of personal preference (also one might be easier to integrate into existing code than the other) – Felix G May 27 '20 at 11:23
  • 1
    @Youssef13: [That is not true.](https://stackoverflow.com/questions/6346969/there-is-no-point-in-freeing-blocks-at-end-of-program/6347182#6347182) It is a routine practice taught to students but is neither mandated by the C standard nor beneficial in **all** situations. When writing a program for a general-purpose multi-user system, if you are going to allocate memory and keep it for the whole of program execution, there is no point in freeing it at the end, and doing so can have adverse consequences on performance. – Eric Postpischil May 27 '20 at 11:27
  • A function cannot return safely an address that exists only in its working space, right? [See question and answers here.](https://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope) https://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope – Leos313 May 27 '20 at 12:08
  • "*I also have to free(tmp) every time.*" - No, you *haven´t to*. Backlink to @EricPostpischil´s [comment](https://stackoverflow.com/questions/62041776/best-practice-regarding-char-pointers-returned-by-functions#comment109730849_62041776). – RobertS supports Monica Cellio May 27 '20 at 12:26
  • 1
    @EricPostpischil The major reason why you _should_ always `free` is that doing so exposes heap corruption bugs elsewhere in the program. Because then you get a crash upon calling `free` which enables you to catch that bug early on. The problem described in the link answer is not really related to heap allocation, but to how a certain OS handles swap files, and how the library port of malloc and heap API for that specific OS are designed. – Lundin May 27 '20 at 12:51
  • @Lundin: Suggesting that a billion users should be subjected to performance penalties so the developer can catch a few heap corruption bugs (that do not otherwise manifest in a way that allows detecting them) is a bad trade-off. Issues with swapping and the memory allocation routines having their data structures spread through memory are not specific to one OS. – Eric Postpischil May 27 '20 at 12:56
  • 1
    @EricPostpischil You could always leave out the `free` from the release build then, if you are concerned about that. In general, I suspect that the major culprit of programs freezing upon exit is that they do their clean-up from the main GUI thread. Instead of closing down the GUI asap and let some background thread chew away at the clean-up in the background, giving back the control of the computer to the user. This is a major problem in computer games for example, freezing upon exit even though are already loaded in RAM and not in a swap file. – Lundin May 27 '20 at 13:01

5 Answers5

2

Both approaches are used in the industry. In your example, people could make an assumption about the maximum size for the resulting filename and use a local array this way:

for (int i = 0; i <= 10; i++) {
    char filename[1024];
    snprintf(filename, sizeof filename, "%s%d", fn. i);
    FILE *f = fopen(filename, "w");
    if (f != NULL) {
        // read f, do something useful ...
        fclose(f);
    } else {
        // report the error?
    }
}

Note that the truncation can be detected with if (snprintf(filename, sizeof filename, "%s%d", fn. i) >= (int)sizeof filename).

If no assumption should be made about the filename length or if the filename composition method is more complicated, returning an allocated string may be a more appropriate option, but testing for memory allocation errors should also be done:

for (int i = 0; i <= 10; i++) {
    char *filename = concat(fn, i);
    if (filename == NULL) { 
       /* handle the error */
       ...
       // break / continue / return -1 / exit(1) ...
    }
    FILE *f = fopen(filename, "w");
    if (f == NULL) {
        /* report this error, using `filename` for an informative message */
    } else {
        // read f, do something useful...
        // keep `filename` available for other reporting 
        fclose(f);
    }
    free(filename);
}

If you are not ready to perform all this bookkeeping, you should probably use a different language with more elaborate object life cycle management or with a garbage collector.


Finally, using C99 compound literals, you could define concat to fit your simplified use case:

char *concat(char *dest, const char *s, int b) {
    sprintf(dest, "%s%d", s, b);
    return dest;
}
#define CONCAT(a, b) concat((char[strlen(a) + 24]){""}, a, b)

CONCAT defines an unnamed local variable length char array of the appropriate size and constructs the concatenation of string a and int b into it. I changed the case to uppercase to underscore the fact that a is evaluated twice in the expansion, and thus should not be an expression that involve side-effects.

You could use this macro as expected in your second code fragment:

    FILE *f;
    char *tmp;
    for (i = 0; i <= 10; i++) {
        f = fopen(CONCAT(fn, i), "w");
        // read f, do something useful ...
        fclose(f);
    }

I probably would not recommend this type of usage, but this is only my opinion.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
2

A more solid design:

FILE *open_file_number(const char *str, int number)
{
    size_t size = strlen(str) + 32;
    char *path = malloc(size);

    if (path == NULL)
    {
        return NULL;
    }
    snprintf(path, size, "%s%d", str, number);

    FILE *file = fopen(path, "w");

    free(path);
    return file;
}

for (i = 0; i <= 10; i++)
{
    FILE *file = open_file_number(some_path, i);

    if (file != NULL)
    {
        // Do your stuff
        fclose(file);
    }
}
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
1

Your first code snippet, where you save the returned pointer and free it when you're done using it, is the proper way to work with a function returning malloc'ed memory.

There are several POSIX functions, such as strdup and getline, that work in this manner, so this is a well known idiom.

Alternatives are:

  • Return a pointer to a static buffer. This has the disadvantage that it is not thread safe and also can't be called twice in one expression.
  • Accept a pointer to a properly sized buffer, in which case properly sizing the buffer is up to the caller.
dbush
  • 205,898
  • 23
  • 218
  • 273
  • Well... "this is a well known idiom" -> "this is a well known source for memory leaks" :) There's a lot of ancient UNIX trash functions with horrible API that just keeps living on. Some of them even made it to the C standard lib. Just because they've been around forever doesn't make them idiomatic. – Lundin May 27 '20 at 12:04
1

What is best practice when dealing with functions which return malloc'd pointers to C-strings?

Best practice: don't use them. A library expecting the caller to free returned data is almost certainly badly designed, with very few exceptions. We know this from 40 years of C language history, where crappily written libraries have created millions upon millions of memory leak bugs.

The basic rule of sane, useful library API design is:

Whoever allocates something is responsible for cleaning up their own mess.

Since C doesn't have RAII or constructors/destructors, that unfortunately means that the sane library needs to provide you with a function for the clean-up and you need to remember to call it. If it doesn't provide such a function, you might want to consider writing wrapper functions that do this - correcting the bad library design for them.

If you are the one implementing the library, you should always try to leave memory allocation to the caller, whenever possible. This is traditionally done by the function taking a pointer to a buffer which it writes to. Then either leaves it to the caller to allocate enough memory (like strcpy/strcat), or alternatively provide a variable with maximum buffer size, after which the function returns how much of the buffer it actually used (like fgets).


In your example, a soundly designed concat would perhaps look like

const char* concat (char* restrict dst, const char* restrict src, int i);

Where src is the source string, i is the integer to add and dst is a large-enough buffer provided by the caller. Optionally, the function returns a pointer equivalent to dst, for convenience. The above also implements proper const correctness plus a micro-optimization with restrict that means that the pointers passed aren't allowed to overlap.

Usage:

char buf [LARGE_ENOUGH];
fp = fopen(concat(buf, foo, i), "w");
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    Functions which have a side-effect and returns a value are another source of bugs (even though common practice in the C programming language). – August Karlstrom May 27 '20 at 12:20
  • @AugustKarlstrom Sure, but what does this have to do with this answer? The function I posted should be fully re-entrant. – Lundin May 27 '20 at 12:43
  • 1
    I would use void as the return type of *concat* to promote code readability, that's all. – August Karlstrom May 28 '20 at 09:45
  • @AugustKarlstrom Yes I prefer such style too, but in case the function doesn't have side-effects and the returned string is `const` qualified, there's no harm to it. – Lundin May 28 '20 at 09:53
  • The harm is that you can potentially miss the fact that the first parameter is an output parameter when the call to *concat* is embedded in an expression. In this case there is too much going on in the same statement. – August Karlstrom May 28 '20 at 09:59
  • @AugustKarlstrom I agree - it is always best to split expressions over several lines. But obviously a production quality function needs to document what the parameters and return type do - at a minimum through source code documentation in comments. – Lundin May 28 '20 at 10:04
0

If you know the maximum size of your strings, you could also do something like this:

char* concat_(char* buf, size_t s, char *a, int b)
{
    /* ... your code ... */
    return buf;
}

#define concat(a, b)    concat_((char[100]){""}, 100, (a), (b))

The macro allocates an unnamed local variable and passes it on to the concat_ function. This function can then do whatever it did before, and just return the pointer to that same object. No malloc, no free, no worries (other than possibly blowing up your stack, if you make the buffer too big).

Edit: Please note that, as Gerhardh pointed out in the comments, this creates a local object (which has automatic storage duration), so the pointer returned by the macro is only valid within the same block where that macro was actually called. Therefore you can't for instance use that pointer as a return value of the functoin which called the macro.

Felix G
  • 674
  • 1
  • 7
  • 17
  • "No malloc, no free, no worries" Well, you mustn't return the result of that macro as that local variable would end to exist. – Gerhardh May 27 '20 at 11:44
  • @Gerhardh Of course, but that's why i wrote "unnamed **local** variable" instead of "compound literal", as it should hopefully be pretty obvious even to a beginner, that you don't pass around pointers to local variables – Felix G May 27 '20 at 11:54
  • If you follow the questions at SO, you will find out soon, that this is not nearly as obvious as you might think. ;) This problems comes up at least twice per week. Also this is a clear limitation that should be mentioned explicitely. – Gerhardh May 27 '20 at 11:55
  • No worries until you exceed the magic number 100, that is... I wouldn't really recommend hiding away the buffer size behind some macro. – Lundin May 27 '20 at 11:58
  • @Gerhardh The compound literal in this case would have the same scope as the enclosing caller block. It's not local to the function, it is local to the caller. – Lundin May 27 '20 at 11:59
  • @Lundin which is why my answer starts with "If you know the maximum size" ... i had hoped that at least that would be obvious ^^ – Felix G May 27 '20 at 12:00
  • @Lundin I am talking about something like `char*func(void){char*ptr=concat("a","b"); return ptr;}` – Gerhardh May 27 '20 at 12:02