1

When passing values to my functions, I often consider either returning an allocated buffer from my function, rather than letting the function take a buffer as an argument. I was trying to figure out if there was any significant benefit to passing a buffer to my function (eg:

void f(char **buff) {
   /* operations */
   strcpy(*buff, value);
}

Versus

char *f() {
    char *buff = malloc(BUF_SIZE);
    /* operations */
    return buff;
}

These are obviously not super advanced examples, but I think the point stands. But yeah, are there any benefits to letting the user pass an allocated buffer, or is it better to return an allocated buffer?

kavulox
  • 60
  • 7
  • 4
    You couldn't validly return char buff[8]; (a local array), since it doesn't continue to exist after function exit. pointer to a string literal, pointer to a static, or typically pointer to a malloc'ed buffer are the return options beyond returning a struct of plain type by value. And if and only if it is a pointer to a malloc'ed buffer, the caller also receives the implicit responsibility of freeing the buffer when done with it. – Avi Berger Sep 20 '22 at 19:26
  • 1
    Will you always be returning pointers to string literals, or might you be returning dynamically allocated material? Data read from a file? The context matters. You can get away with a lot with string literals (but you shouldn't try modifying them, so you should probably be using `const` in the types). But with other sources of data, you have different trade-offs. Look at the design of POSIX [`getline()`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html), for example, versus [`fgets()`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/fgets.html). – Jonathan Leffler Sep 20 '22 at 19:36
  • @JonathanLeffler I should've added more context. I'll check out those two, thanks. – kavulox Sep 20 '22 at 19:42
  • You can still edit your question. Indeed, at the moment, it would be good if you did edit it and added more context. This has the potential to be a good question with appropriate context added. – Jonathan Leffler Sep 20 '22 at 19:43
  • 1
    This code just shows two options for returning an address. It is not relevant that it is pointing to a buffer; it might as well be an `int` or another data type instead of a pointer. So the question is essentially the same as “Should I return an `int` with `int f(void)` or `void f(int *p)`”? Absent any reason to write the more complicated code, write the simpler code: Return the value as a function return value. That is what function return values are for. With optimization, they might perform very similarly, but there is no reason to use the nominally more complicated method. – Eric Postpischil Sep 20 '22 at 19:43
  • 2
    If the question were instead whether a buffer should be allocated in the function and then returned to the caller or should be provided by the caller and merely filled in by the function, that would be a different question. – Eric Postpischil Sep 20 '22 at 19:44
  • Meditate on why `fopen()` returns a pointer to a buffer (or NULL) and how that differs from `fread()` that fills in a buffer. The use case for passing or returning buffers will become clear. Then, consider `realloc()`... and `puts( strcat( strcpy( buf, "Hello " ), "World!" ) );` (PS: Who'd imagine `strdup();` could cause memory leaks... Simple str function!! ) – Fe2O3 Sep 20 '22 at 20:03
  • Related: [Is it bad practice to allocate memory in a DLL and give a pointer to it to a client app?](https://stackoverflow.com/q/13625388/) – jamesdlin Sep 20 '22 at 20:05

3 Answers3

2

Are there any benefits to using one over the other, or is it just useless?

This is a specific case of the more general question of whether a function should return data to its caller via its return value or via an out parameter. Both approaches work fine, and the pros and cons are mostly stylistic, not technical.

The main technical consideration is that each function has only one return value, but can have any number of out parameters. That can be worked around, but doing so might not be acceptable. For example, if you want to reserve your functions' return values for use as status codes such as many standard library functions produce, then that limits your options for sending back other data.

Some of the stylistic considerations are

  • using the return value is more aligned with the idiom of a mathematical function;
  • many people have trouble understanding pointers; and in particular,
  • non-local modifications effected through pointers sometimes confuse people. On the other hand,
  • the return value of a function can be used directly in an expression.

With respect to modifications to the question since this answer was initially posted, if the question is about whether to dynamically allocate and populate a new object vs populating an object presented by the caller, then there are these additional considerations:

  • allocating the object inside the function frees the caller from allocating it themselves, which is a convenience. On the other hand,
  • allocating the object inside the function prevents the caller from allocating it themselves (maybe automatically or statically), and does not provide for re-initializing an existing object. Also,
  • returning a pointer to an allocated object can obscure the fact that the caller has an obligation to free it.

Of course, you can have it both ways:

void init_thing(thing *t, char *name) {
    t->name = name;
}

thing *create_thing(char *name) {
    thing *t = new malloc(sizeof(*t));

    if (t) {
        init_thing(t);
    }
    return t;
}
John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

Both options work.
But in general, returning information through the parameters (the second option) is preferable because we usually reserve the return of the function to report an error. And we can return several information trough multiple parameters. Hence, it is easier for the caller to check if the function was OK or not by checking first the returned value. Most of the services from the C library or the Linux system calls work like this.

Concerning your examples, both options work because you are referencing a constant string which is globally allocated at program's loading time. So, in both solutions, you return the address of this string.
But if you do something like the following:

char *func(void) {
   char buff[] = "example";
   return buff;
}

You actually copy the content of the constant string "example" into the stack area of the function pointed by buff. In the caller the returned address is no longer valid as it refers to a stack location which can be reused by any other function called by the caller.
Let's compile a program using this function:

#include <stdio.h>

char *func(void) {
   char buff[] = "example";
   return buff;
}

int main(void) {

  char *p = func();

  printf("%s\n", p);

  return 0; 

}

If the compilation options of the compiler are smart enough, we get a first red flag with a warning like this:

$ gcc -g bad.c -o bad
bad.c: In function 'func':
bad.c:5:11: warning: function returns address of local variable [-Wreturn-local-addr]
    5 |    return buff;
      |           ^~~~

The compiler points out the fact that func() is returning the address of a local space in its stack which is no longer valid when the function returns. This is the compiler option -Wreturn-local-addr which triggers this warning. Let's deactivate this option to remove the warning:

$ gcc -g bad.c -o bad -Wno-return-local-addr

So, now we have a program compiled with 0 warning but this is misleading as the execution fails or may trigger some unpredictible behaviors:

$ ./bad
Segmentation fault (core dumped)
Rachid K.
  • 4,490
  • 3
  • 11
  • 30
  • Re “we usually reserve the return of the function to signal an error”: No, using the return value of the function is one option for using functions, but it is not usually reserved for this, in the sense there is a free choice depending on what the function designer wants. There are plenty of common functions that return some data other than an error indication or code, such as `getchar`, `sin`, `isupper`, `asctim`, and more. – Eric Postpischil Sep 20 '22 at 20:25
  • Yes for simple functions this is true but for functions returning several values or a complex value (e.g. stat() returning a structure stat or accept() returning a structure and its size), it is preferable to return the result from the parameters. – Rachid K. Sep 20 '22 at 20:30
  • Concerning some standard functions like getchar(), it is in my opinion a pity to return the result as it is difficult to know if it worked or not. getchar() returns EOF it there is an error or it this is a real End of file. This quite difficult for the caller to analyze the result afterwards. – Rachid K. Sep 20 '22 at 20:36
-1

You can't return the address of local memory.

Your first example works because the memory in "example" will not be deallocated. But if you allocated local (aka automatic) memory it automtically be deallocated when the function returns; the returned pointer will be invalid.

char *func() {
   char buff[10];

   // Copy into local memory
   strcpy(buff, "example");

   // buff will be deallocated after returning.
   // warning: function returns address of local variable
   return buff;
}

You either return dynamic memory, using malloc, which the caller must then free.

char *func() {
  char *buf = malloc(10);
  strcpy(buff, "example");
  return buff;
}

int main() {
  char *buf = func();
  puts(buf);
  free(buf);
}

Or you let the caller allocate the memory and pass it in.

void *func(char **buff) {
   // Copy a string into local memory
   strcpy(buff, "example");

   // buff will be deallocated after returning.
   // warning: function returns address of local variable
   return buff;
}

int main() {
  char buf[10];
  func(&buf);
  puts(buf);
}

The upside is the caller has full control of the memory. They can reused existing memory, and they can use local memory.

The downside is the caller must allocate the correct amount of memory. This might lead to allocating too much memory, and also too little.

An additional downside is the function has no control over the memory which has been passed in. It cannot grow nor shrink nor free the memory.

You can only return one thing from a function.

For example, if you want to convert a string to an integer you could return the integer like atoi does. int atoi( const char *str ).

int num = atoi("42");

But then what happens when the conversion fails? atoi returns 0, but how do you tell the difference between atoi("0") and atoi("purple")?

You can instead pass in an int * for the converted value. int my_atoi( const char *str, int *ret ).

int num;
int err = my_atoi("42", &num);
if(err) {
  exit(1);
}
else {
  printf("%d\n");
}
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • The data in the stack is not automatically deallocated upon the return of a function, it stays as it is until a new function call overwrites the area with its own local variables. So, this is why this kind of bug is tricky because it may stay hidden a long time depending on the paths taken by the program. – Rachid K. Sep 20 '22 at 20:45
  • 1
    @RachidK. It is *deallocated* upon return, meaning it is marked as being available. When it is overwritten is an implementation detail. Yes, the contents might stick around and most implementations act that way and it can be tricky to debug. However, compilers will warn you about it. – Schwern Sep 20 '22 at 21:04
  • Yes we agree. So, the term should be "no longer valid" or something like that instead of "deallocated" which is misleading as it lets suppose there is an action and as you said, it is implementation dependent... – Rachid K. Sep 20 '22 at 21:06
  • 1
    @RachidK. "No longer valid" is fine, but "deallocate" is the correct term. For example, from 7.22.3.3 "*The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation.*" Memory is allocated (the "alloc" in malloc, calloc, and realloc) and deallocated (free or upon return). Once deallocated, the contents are invalid and can be overwritten at any time. Deallocation is defined by the standard and is not implementation dependent. When deallocated memory may be overwritten is implementation dependent. – Schwern Sep 20 '22 at 21:10
  • 1
    @RachidK. The technical way to put it is in 6.2.4 "*[automatic] lifetime extends from entry into the block with which it is associated until execution of that block ends in any way.*" That's the automatic allocation/deallocation defined by the standard. "*If an object is referred to outside of its lifetime, the behavior is undefined.*" That's about when the contents might be overwritten. – Schwern Sep 20 '22 at 21:14