27

I have a use case where I can get pointers of strings allocated either in memory or literals. Now the latter can't be freed so that's a problem if I pass the wrong one. Is there a way to know which one is allocated and which not?

char *b = "dont free me!";
if(!IS_LITERAL(b)) {
    free(b);
}

I imagine something like that.

My example:

Scenario 1: literal

char *b = "dont free me!";
scruct elem* my_element = mylib_create_element(b);
// do smth
int result = mylib_destroy_element(my_element); // free literal, very bad

Scenario 2: in heap

char *b = malloc(sizeof(char)*17); // example
strncpy(b, "you can free me!",17);

scruct elem* my_element = mylib_create_element(b);
// do smth
int result = mylib_destroy_element(my_element); // free heap, nice

How the user calls mylib_create_element(b); is not under my control. If he frees before mylib_destroy_element it can crash. So it has got to be mylib_destroy_element that cleans up.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
alknows
  • 1,972
  • 3
  • 22
  • 26
  • 5
    you can avoid this problem by doing the de-allocation in the same part of the code as the allocation. Read up on RAII eg. for one approach. – Sander De Dycker Mar 06 '15 at 10:21
  • 2
    Just because it's not a literal it doesn't mean you need to call `free`. For example `char buffer[20]; b=buffer;` buffer isn't a literal, but it hasn't been allocated on the heap, either. – Sean Mar 06 '15 at 10:21
  • 2
    By having only a `char*`, there's no way to tell where it's pointing to. It's just a char pointer, doesn't matter how you set it and it doesn't have any additional info. You have to track it yourself if you allocate dynamically. – P.P Mar 06 '15 at 10:21
  • @SanderDeDycker this is going to be a lib so i cant control what comes in. I offer a destroy function which should clean up. what the create function did. – alknows Mar 06 '15 at 10:27
  • 12
    @Aldi : you should either have your library do both the allocation and de-allocation, or neither. Don't de-allocate an object in the library that was allocated outside of the library. Or in other words : if you control both the create and destroy functions, then you do control what comes into the library. – Sander De Dycker Mar 06 '15 at 10:30
  • 1
    A very similar question http://stackoverflow.com/q/26950411/57428 – sharptooth Mar 06 '15 at 10:36
  • 1
    Already asked here: http://stackoverflow.com/q/26950411/1382251 – barak manos Mar 06 '15 at 10:41
  • You are on the wrong way. For example, just think what happens if you get passed a pointer to a stack char array? Memory-management of arguments to API functions in C is something that the user should do, not the API. –  Mar 06 '15 at 11:18
  • 2
    How can you do RAII in bare C when you don't have destructors to help you? – pjc50 Mar 06 '15 at 13:37
  • 1
    @pjc50 : you can't do RAII in C, but you don't really have to since there are no exceptions. Wikipedia claims that there is this though: http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization#GCC_.22cleanup.22_extension_for_C – choobablue Mar 06 '15 at 22:25
  • There is an item about this in [*The Pragmatic Programmer*](https://pragprog.com/the-pragmatic-programmer/extracts/tips) that is just as spot-on as everything in that book: “**Finish What You Start** – Where possible, the routine or object that allocates a resource should be responsible for deallocating it.” – 5gon12eder Mar 06 '15 at 23:01
  • You allocated some strings on the heap and pulled some from static storage, _and yet you don't know which you did?_ – geometrian Mar 07 '15 at 00:33

8 Answers8

26

I've had a similar case recently. Here's what I did:

If you're making an API that accepts a string pointer and then uses it to create an object (mylib_create_element), a good idea would be to copy the string to a separate heap buffer and then free it at your discretion. This way, the user is responsible for freeing the string he used in the call to your API, which makes sense. It's his string, after all.

Note that this won't work if your API depends on the user changing the string after creating the object!

  • I think im gonna go for this – alknows Mar 06 '15 at 12:06
  • Another possibility would be if the API makes no copy, and the user is responsible for removing the string from the API and then freeing it. The POSIX `putenv` API works this way. – Random832 Mar 06 '15 at 14:35
  • @Random832: that's nice, but it depends on the use-case. My API, for example, dealt with objects sent in callbacks, so it couldn't afford this: if the string passed in was a stack-based `char` array, it would go out of scope, giving the user a nasty surprise. Not knowing the use-case here, I tried to give a solution that would fit the most use-cases. Making no copy might actually be better in some scenarios: some users could modify the string passed to the function expecting the object to change. But either way, it has to be properly documented =) –  Mar 06 '15 at 14:44
  • @Aldi this is such a common technique that there's a function to specifically copy a string to the heap: [`strdup`](http://stackoverflow.com/questions/252782/strdup-what-does-it-do-in-c) – Mark Ransom Mar 06 '15 at 16:21
  • @MarkRansom the question was, can we be smarter than that? Apparently there is no standard out of the box solution. And best practice dictates what Mints97 suggested. Thanks guys – alknows Mar 07 '15 at 15:27
23

On most Unixes, there are values 'etext' and 'edata'. If your pointer is between 'etext' and 'edata', then it shall be statically initialized. Those values are not mentioned in any standard, so the usage is non portable and at your own risk.

Example:

#include<stdio.h>
#include<stdlib.h>

extern char edata;
extern char etext;

#define IS_LITERAL(b) ((b) >= &etext && (b) < &edata)

int main() {
    char *p1 = "static";
    char *p2 = malloc(10);
    printf("%d, %d\n", IS_LITERAL(p1), IS_LITERAL(p2));
}
Marian
  • 7,402
  • 2
  • 22
  • 34
  • This is interesting. This answers my question actually although i cant use it if not a standard feature and portable. Thanks – alknows Mar 06 '15 at 12:07
12

You can only ask user to explicitly mark their input as literal or allocated string.

However, as @Mints97 mentions in his answer, basically this approach is architecturally incorrect: you force user of your library for some explicit actions, and if he forgets to, it leads most likely to a memory leak (or even to an application crash). So use it only if:

  1. You want to drastically reduce amount of allocations. In my case it was JSON node names, that never change during lifetime of a program.
  2. You have good control of code of consumers of your library. In my case, libraries are shipped with binaries and tightly bound to them.

Implementation example

#define AAS_DYNAMIC             'D'
#define AAS_STATIC              'S'

#define AAS_STATIC_PREFIX       "S"
#define AAS_CONST_STR(str)      ((AAS_STATIC_PREFIX str) + 1)

char* aas_allocate(size_t count) {
    char* buffer = malloc(count + 2);
    if(buffer == NULL)
        return NULL;

    *buffer = AAS_DYNAMIC;

    return buffer + 1;
}

void aas_free(char* aas) {
    if(aas != NULL) {
        if(*(aas - 1) == AAS_DYNAMIC) {
            free(aas - 1);
        }
    }
}

...

char* s1 = AAS_CONST_STR("test1");
char* s2 = aas_allocate(10);

strcpy(s2, "test2");

aas_free(s1);
aas_free(s2);

Testing performance (note #1)

I benchmarked my libtsjson library with following code (800k iterations):

    node = json_new_node(NULL);
    json_add_integer(node, NODE_NAME("i"), 10);
    json_add_string(node, NODE_NAME("s1"), json_str_create("test1"));
    json_add_string(node, NODE_NAME("s2"), json_str_create("test2"));
    json_node_destroy(node);

My CPU is Intel Core i7 860. If NODE_NAME is just a macro, time per iteration was 479ns If NODE_NAME is a memory allocation, time per iteration was 609ns

Hinting user or compiler (note #2)

  • Add a hint to all such pointers, i.e. Linux static source analyser Sparse may catch such issues

    char __autostring* s1 = aas_copy("test"); /* OK */
    char __autostring* s2 = strdup("test");   /* Should be fail? */
    char* s3 = s1;                            /* Confuses sparse */
    char* s4 = (char*) s1;                    /* Explicit conversion - OK */
    

(not completely sure about outputs of Sparse)

  • Use simple typedef to make compiler raise a warning when you do something wrong:

    #ifdef AAS_STRICT
    typedef struct { char a; } *aas_t;
    #else
    typedef char *aas_t;
    #endif
    

This approach is one more step to a world of a dirty C hacks, i.e. sizeof(*aas_t) is now > 1.

Full source with changes may be found here. If compiled with -DAAS_STRICT it will raise tons of errors: https://ideone.com/xxmSat Even for correct code it can complain about strcpy() (not reproduced on ideone).

myaut
  • 11,174
  • 2
  • 30
  • 62
  • Thats quite a nice workaround. the issue is that i cant force the users of this lib to create their strings like that. Thanks for sharing. – alknows Mar 06 '15 at 12:47
  • 3
    This works, but it's bad API design. All other functions in C just take plain strings, why wouldn't yours? It's just not what you'd expect when you see a function signature like ``mylib_create_elem(char*)``! And if you use it wrongly, you'll silently leak memory and not get any warning. Mints97's answer is much better all around.. – Christian Aichinger Mar 06 '15 at 13:51
  • @ChristianAichinger, yes, you and Mints97 are correct. At the moment I answered, Aldi didn't mention he will expose his interface as library call. When I designed my API, the key idea was to increase performance. I've added notes on performance of my approach, and how you can overcome issues it arises. – myaut Mar 06 '15 at 20:42
6

The simple answer is you cannot do this since C language does not demarcate stack, heap and data section.

If you wanted to have a guess - you could collect address of the first variable on the stack, address of the calling function and address of a byte of memory allocated to heap; and then compare it with your pointer - a very bad practice with no guarantees.

It's best for you to revamp your code such a way that you don't come across this issue.

Arjun Sreedharan
  • 11,003
  • 2
  • 26
  • 34
  • 2
    Fun fact: the word *stack* appears nowhere in the text of C99. (I haven't checked C11; this *may* have changed with the introduction of threads.) There is a requirement to support recursive function calls, but a conventional stack, or even a LIFO data structure of some sort, is not necessary to meet that requirement; see for instance "Cheney on the M.T.A." – zwol Mar 07 '15 at 03:56
6

Here is a practical way:

Although the C-language standard does not dictate this, for all identical occurrences of a given literal string in your code, the compiler generates a single copy within the RO-data section of the executable image.

In other words, every occurrence of the literal string "dont free me!" in your code is translated into the same memory address.

So at the point where you want to deallocate that string, you can simply compare its address with the address of the literal string "dont free me!":

if (b != "dont free me!") // address comparison
    free(b);

To emphasize this again, it is not imposed by the C-language standard, but it is practically implemented by any decent compiler of the language.


The above is merely a practical trick referring directly to the question at hand (rather than to the motivation behind this question).

Strictly speaking, if you've reached a point in your implementation where you have to distinguish between a statically allocated string and a dynamically allocated string, then I would tend to guess that your initial design is flawed somewhere along the line.

barak manos
  • 29,648
  • 10
  • 62
  • 114
4

You can do the following:

  typedef struct 
{
 int is_literal;
 char * array;
} elem;

Every time you you allocate the elem.array on the heap simply set the is_literal to 0. When you set the array to be literal, set the flag to a non-zero value, e.g.:

elem foo;
foo.array = "literal";
foo.is_literal = 1 ;

or

elem bar;
bar.array = (char*) (malloc(sizeof(char) * 10)) ;
bar.is_literal = 0;

Then at the client side:

if(!bar.is_literal) {
free(bar.array);
}

Simple as that.

Evdzhan Mustafa
  • 3,645
  • 1
  • 24
  • 40
4

This is exactly why the rule is that only the piece of code or module that created a string may free it. In other words, every string or piece of data is "owned" by the code unit that created it. Only the owner can free it. A function should never free data structures that it received as arguments.

Tyler Durden
  • 11,156
  • 9
  • 64
  • 126
2

Back in the early days when a 80386 could have 8 megabytes of RAM maximum, and the ideas of making objects were being explained in every other magazine article, I did not like copying perfectly good literals into string objects (allocating and freeing the internal copy) and I asked Bjarne about that, since a crude string class was one of his examples of C++ fancy-stuff.

He said don't worry about it.

Is this having to do with literals vs other char* pointers? You can always own the memory. Imthink so, from your ideas of looking for different memory segments.

Or is it more generally that ownership may or may not be given, there is no way to tell, and need to store a flag: "hey, this is a heap object, but someone else is still using it and will take care of it later, OK?"

For the tractable case where it is "on the heap" or "not" (literals, globals, stack-based), you could have the free function know. If you supplied a matching set of allocate/maybe-free, it could be written to know what memory is under its control.

JDługosz
  • 5,592
  • 3
  • 24
  • 45