4

Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.

char *str_Modify()
{
    char buffer[70000] = { 0 };
    char targetString[70000] = { 0 };

    memset(targetString, '\0', sizeof(targetString));

    ...
    ...
    ...
    return targetString;
}

As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns. My question

  • Wanted to know what is the static data memory limit?
  • What can be the quick fix for this code? Is it a good practice to make the variable targetString static?
hyde
  • 60,639
  • 21
  • 115
  • 176
NJMR
  • 1,886
  • 1
  • 27
  • 46
  • Is this C or C++? What extra constraints there are? – hyde Nov 14 '17 at 11:15
  • 1
    If you make it static, add a bold comment above the code and any signature in an h file: 'NOT THREAD SAFE - DO NOT CALL FROM MULTIPLE THREADS'. Altrnatives are to mallocate the space or have the caller supply the buffer, both of which would require changes to calling code. – Martin James Nov 14 '17 at 11:16
  • 2
    Oh - and ensure that the original author no longer works for your company. – Martin James Nov 14 '17 at 11:16
  • Also, quick fix depends on how return value is used. Is static ok? Is memory leak ok (function called only fixed number of times)? Is it possible to change function signature? Is it possible to modify code at all places where this is called? – hyde Nov 14 '17 at 11:17
  • 1
    You could make it static but it's not a very clean solution. The best would by to pass a parameter to the new buffer to the function, eg: `char *str_Modify(char *stringtomodify)`, so it's the responsabilikty of the caller to provide the buffer for the modified string. – Jabberwocky Nov 14 '17 at 11:19
  • @MichaelWalz Let's make that `char *str_Modify(char *stringtomodify, size_t sizeofstringtomodify)`. All string functions should require and check available buffer size. – user694733 Nov 14 '17 at 11:34
  • @user694733: All string functions aside from the ones in the C Standard Library ;-) ? – Bathsheba Nov 14 '17 at 11:35
  • BTW: _static data memory limit?_ probably nothing you need to worry about. It should be large enough on any modern system. – Jabberwocky Nov 14 '17 at 11:38
  • @Bathsheba Thanks, now I need to suppress an sudden urge to start long rant about C standard library... – user694733 Nov 14 '17 at 11:44
  • @user694733: I think you should let it all out: I'll start with the dismal set of functions for reading in data. – Bathsheba Nov 14 '17 at 11:45
  • I'll follow with the static state in strtok().. – Martin James Nov 14 '17 at 15:22

4 Answers4

6

(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)

It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.

Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.

And pick your language: In C++ there are other techniques.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
1

It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.

For example:

#include <stdio.h>

int *f()
{
    static int a[10]={0};
    return a;
}
int main() 
{
    f();
    return 0;
}

It is working fine on GCC compiler. [Live Demo]

But, if you remove static keyword then compiler generates warning:

prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
     return a;
            ^

Also, see this question comments wrote by Ludin.

I believe you are confusing this with int* fun (void) { static int i = 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is undefined behavior.

Also, tutorialspoint say's :

Second point to remember is that C does not advocate to return the address of a local variable to outside of the function, so you would have to define the local variable as static variable.

msc
  • 33,420
  • 29
  • 119
  • 214
1

Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.

Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.

An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.

As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!

NickJH
  • 561
  • 3
  • 7
1

Wanted to know what is the static data memory limit?

Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.

What can be the quick fix for this code?

The quick fix is indeed to make the buffer static.

The good fix is to rewrite the function as

char *modify(char *out, size_t outsz) {
    // ...
    return out;
}

(returning the input is just to simplify reusing the new function in existing code).

Is it a good practice to make the variable targetString static?

No. Sometimes it's the best you can do, but it has a number of problems:

  1. The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
  2. Using static (or global) variables breaks re-entrancy. Simple example: code like

    printf("%s,%s\n", str_Modify(1), str_Modify(2));
    

    cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).

  3. Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.
Useless
  • 64,155
  • 6
  • 88
  • 132