3

Why is Malloc not required to be called on string in struct?

Hi, I just started studying C this week, and I found a weird inconsistency I can't explain.

From what I know, when a function ends, the allocated items in the "stack" are removed from memory. If we want a variable to last longer past this (when the scope ends), we use malloc.

This makes total sense for structs and is consistent. If you create a struct then try to print the values outside of the scope of the function it was created - it will cause a error unless you malloc the struct.

However, I found that if I try to create a struct and malloc it with a default value (the size of the struct), then add a char pointer (string) as a struct member, it receives the value and no error is reached when trying to print it - I thought I would need to call malloc on the struct member itself, then copy the string into it?

Any ideas what is happening here??

This prints the struct member with no issue.

typedef struct person {
  char *name;
} person;

// print struct fields
void print_person(person *input) { printf("Name: %s", input->name); }

person *person_generator(char *name) {
  my_struct *person = malloc(sizeof(my_struct));
  // assign given char pointer to struct field
  person->name = name;
  return person;
}

person *create_default_person() {
  person *new_person = person_generator("Alfred Hitchcock");
  return new_person;
}

int main() {
  person *default_person = create_default_person();
  print_person(default_person);
  return 0;
}

Im confused why the following code below is not mandatory?

typedef struct person {
  char *name;
} person;

// print struct fields
void print_person(person *input) { printf("Name: %s", input->name); }

person *person_generator(char *name) {
  my_struct *person = malloc(sizeof(my_struct));
  // didnt need to do this?
  person->name = malloc(strlen(name) + 1);
  strcpy(person->name, name);
  return person;
}

char *get_string() {
  char *item = "Alfred Hitchcock";
  return item;
}

person *create_default_person() {
  char *value = get_string();
  person *new_person = person_generator(value);
  return new_person;
}

int main() {
  person *default_person = create_default_person();
  print_person(default_person);
  return 0;
}

I am expecting when "create_default_person()" function ends, the pointer it created is destroyed from the stack its inside of, then the struct member should be pointing to garbage data right?

EDIT

Additional question:

I was told the answer was that string literals are stored for the lifetime of the application. Lets take another example. Read a string from a file, then assign it into struct field without calling malloc.

Why does this still print correctly

typedef struct person {
  char *name;
} person;

void print_person(person *input) { printf("Name: %s", input->name); }

person *person_generator(char *name) {
  my_struct *person = malloc(sizeof(my_struct));
  person->name = name;
  return person;
}

person *create_default_person() {
  FILE *fp;
  fp = fopen("./data.txt", "r");
  char buffer[255];

  if (fp == NULL) {
    printf("Couldnt open\n");
    return NULL;
  }

  fgets(buffer, 255, fp);
  fclose(fp);

  person *new_person = person_generator(buffer);
  return new_person;
}

int main() {
  person *human = create_default_person();
  if (human == NULL) {
    printf("Failed to find human");
    return 0;
  }
  print_person(human);
  return 0;
}
  • 2
    Note: you have not checked that either call to `malloc` succeeds. You may also wish to use `person->name = strdup(name);`. – Chris Jul 28 '23 at 21:13
  • 1
    @Nathaniel Woodbury, Also consider how you would `delete_person(default_person)`. Likely that function would `free(person->name);` if `person->name` was allocated with `malloc()` and not free'd if its origin was `person->name = name;`. So best to allocate: `person->name = malloc(strlen(name) + 1); strcpy(person->name, name);` in all cases, so later code can free in all cases. – chux - Reinstate Monica Jul 28 '23 at 21:18
  • 1
    In your follow-up question, the example program is invoking undefined behavior inside `print_person()` when it tries to dereference a pointer to `char buffer[255]` after that buffer has gone away. When a program invokes undefined behavior, anything can happen, including output is that matches what you would expect from a correct program -- but that's just blind luck (i.e. nothing has overwritten that portion of the stack yet) and you can't rely on it to work that way. (hotel management will not be happy with you: https://stackoverflow.com/a/6445794/131930 ) – Jeremy Friesner Jul 28 '23 at 21:41
  • @JeremyFriesner Such a clear answer. Thank so much – Nathaniel Woodbury Jul 28 '23 at 21:46

2 Answers2

3

In C, string literals (like "Alfred Hitchcock") are stored statically as part of the process's static data.

As such, they are guaranteed to remain valid for the lifetime of the process, and their storage space doesn't need to be manually allocated or freed. A simple pointer to them is enough to use them, as you discovered.

If you were making up random new names at runtime, on the other hand, you would need to explicitly allocate space for those runtime-generated strings and manage their lifetimes with malloc()/free()/etc.

I am expecting when "create_default_person()" function ends, the pointer it created is destroyed from the stack its inside of, then the struct member should be pointing to garbage data right?

Not quite -- the pointer gets destroyed when the function ends, but the data the pointer points to does not. The string literal itself, in this case, remains valid indefinitely, regardless of what is or is not pointing to it.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • You might want to mention that the second version is better, because it doesn't require the caller to guarantee that the parameter string has permanent lifetime. – Barmar Jul 28 '23 at 21:01
  • Re “the *pointer* gets destroyed when the function ends”: Not sure which function you are talking about, but there is no pointer that gets destroyed in OP’s code. `create_default_person` passes the address of (the first character of) `"Alfred Hitchcock"` to `person_generator`. `person_generator` stores it in `person->name` and returns `person`, which is a pointer to memory it allocated. `create_default_person` initializes `new_person` with that returned address and returns it… – Eric Postpischil Jul 28 '23 at 21:15
  • … If some function did return a pointer to one of its local automatic objects, that object would be “destroyed” in the C++ sense. In C, we say its lifetime ends, which just means memory is no longer reserved for it. Most C implementations do not undertake any “destruction” activities beyond that. The returned address would not be destroyed in any sense, although its value would be *indeterminate* as defined by the C standard. – Eric Postpischil Jul 28 '23 at 21:16
  • Great answer. I have added an additional EDIT follow up question for anyone to answer. Thanks! – Nathaniel Woodbury Jul 28 '23 at 21:35
0

From what I know, when a function ends, the allocated items in the "stack" are removed from memory. If we want a variable to last longer past this (when the scope ends), we use malloc.

Any values declared static in the function will be there as long as your code is loaded into memory. May be you can not acess it anymore, but the values will be there to stay. In a next call to the same function they will have the same value and address. This is the same for string literals like item in char *item = "Alfred Hitchcock".

Areas allocated inside a function by calling malloc() are not magically removed: this is the mission of free and your code is responsible for that: areas allocated by malloc must the deallocated by your code, by calling free before returning from the function, or using program logic and returned pointers. Or else the area will be lost forever. This is called memory leaking

However, I found that if I try to create a struct and malloc it with a default value (the size of the struct), then add a char pointer (string) as a struct member, it receives the value and no error is reached when trying to print it - I thought I would need to call malloc on the struct member itself, then copy the string into it?

about this

Here is the struct

typedef struct person
{
    char* name;
} person;

person is just a container for a pointer to char. And has fixed size. You do not need to do nothing with it. It can point to a 1 gigabyte area, it can point do an array of double. It can point to NULL. If you need something there you need to put it there.

A bit off-topic but, as you are learning, see that you do not need 2 names for the struct. And there is a silent convention in C to reserve a first uppercase letter for naming names you define, so your person could be better as

typedef struct
{
    char* name;
} Person;

Also note that the compiler does not care about the way you align code. Indentation is just for reading. Notation is something religious and rarely we can choose how to declare or align things in source code: this is decided by your school, your company or your client. Often we can not even decide which language to use.

But reality is that when you declare name in char* name you are in fact declaring name as a char* as your IDE or compiler can tell you, so it may things easier for you write this way

typedef struct
{
    char* name;

}   Person;

and never

typedef struct
{
    char *name;

}   Person;

Here name is char*. It is being declared as such. If name is char* then obviously *name is char: this is the meaning of the * here in C. It is a consequence of the declaration, not the declaration per se.

Writing code this way can make it easier for you to read and understand your code later.

About the first code

void print_person(Person* input)
{
    printf("Name: %s", input->name);
}

It is ok but it would be smart to check for a NULL in input

person *person_generator(char *name) {
  my_struct *person = malloc(sizeof(my_struct));
  // assign given char pointer to struct field
  person->name = name;
  return person;
}

It is also somewhat ok, but as I told you before it is better to write as it is

Person* person_generator(char* name)
{
    Person* one = malloc(sizeof(Person));
    one->name = name; // can be NULL
    return one;
}

Problem here is that writing this way you have no control of what you are putting inside the struct. See your code below and I will come to this afterwards

Here

person *create_default_person() {
  person *new_person = person_generator("Alfred Hitchcock");
  return new_person;
}

It is ok, just a default, but consider

Person* create_default_person()
{
    return person_generator("Alfred Hitchcock");
}

It is easier to read.

back to person_generator

If you do not copy the string contents to a new allocated area pointed by name you are in for trouble. This area can not be freed safely if you do not do this. If person_generator (NULL) is called you could end up calling free(NULL) and it is harmeless. But if person_generator("Alfred Hitchcock") is called, as you do in create_default_person(), this string is static, a literal and can not be free, so your code will crash if you free the Person struct. Better is always own your data inside Person as in

Person* create_person(char* name)
{
    Person* one = (Person*)malloc(sizeof(Person));
    if (name == NULL) return one;
    one->name = (char*)malloc(sizeof(strlen(name) + 1));
    if (name == NULL) return one;  // if malloc failed
    strcpy(one->name, name);       // copy all data
    printf("\tAllocating \"%s\"\n", one->name);
    return one;
}

[name changed to create_person() from person_generator]

Many people will tell you to not cast the result of malloc. It is a thing from the 90's based on a book from '95 and on a C-faq document from some years later. This in ancient stuff: do no trust it. As you see here even in this modest 10-line code above sometimes you need to malloc many things in a single page of code and casting the returned pointers serve as documentation or warnings to you and your people. malloc just returns void* and by not letting the conversion implicit you can save youself a lot of trouble. Today we know that implicit things are not this good...

Also, when you write code to create a dynamic thing please get used to immediately write code to delete it. As in

Person* erase_person(Person* one)
{
    if (one == NULL) return NULL;
    if (one->name != NULL)
        printf("\tDeleting \"%s\"\n", one->name);
    free(one->name);
    free(one);
    return NULL;
}

And test it.

A complete example

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct
{
    char* name;
} Person;

Person* create_default_person();
Person* create_person(char*);
Person* erase_person(Person*);
void    print_person(Person*);

int main(void)
{
    Person* five[5] = {0};
    for (int i = 0; i < 5; i += 1)
        five[i] = create_default_person();
    printf("    Persons:\n");
    for (int i = 0; i < 5; i += 1) print_person(five[i]);
    printf("    Deleting all Persons\n");
    five[0] = erase_person(five[0]);
    for (int i = 0; i < 5; i += 1)
        five[i] = erase_person(five[i]);
    five[0] = create_person("Isac Asimov");
    print_person(five[0]);
    five[0] = erase_person(five[0]);
    return 0;
}

Person* create_default_person()
{
    return create_person("Alfred Hitchcock");
}

Person* erase_person(Person* one)
{
    if (one == NULL) return NULL;
    if (one->name != NULL)
        printf("\tDeleting \"%s\"\n", one->name);
    free(one->name);
    free(one);
    return NULL;
}

Person* create_person(char* name)
{
    Person* one = (Person*)malloc(sizeof(Person));
    if (name == NULL) return one;
    one->name = (char*)malloc(sizeof(strlen(name) + 1));
    if (name == NULL) return one;  // if malloc failed
    strcpy(one->name, name);       // copy all data
    printf("\tAllocating \"%s\"\n", one->name);
    return one;
}

void print_person(Person* input)
{
    if (input == NULL) return;
    printf("\tName: %s\n", input->name);
}

This is the output

        Allocating "Alfred Hitchcock"
        Allocating "Alfred Hitchcock"
        Allocating "Alfred Hitchcock"
        Allocating "Alfred Hitchcock"
        Allocating "Alfred Hitchcock"
    Persons:
        Name: Alfred Hitchcock
        Name: Alfred Hitchcock
        Name: Alfred Hitchcock
        Name: Alfred Hitchcock
        Name: Alfred Hitchcock
    Deleting all Persons
        Deleting "Alfred Hitchcock"
        Deleting "Alfred Hitchcock"
        Deleting "Alfred Hitchcock"
        Deleting "Alfred Hitchcock"
        Deleting "Alfred Hitchcock"
        Allocating "Isac Asimov"
        Name: Isac Asimov
        Deleting "Isac Asimov"

The code just creates an array of pointers to Person, does some stuff with it and deletes everyhing. It may be useful to see how use this things.

from the second code

char *get_string() {
  char *item = "Alfred Hitchcock";
  return item;
}

person *create_default_person() {
  char *value = get_string();
  person *new_person = person_generator(value);
  return new_person;
}

I am expecting when "create_default_person()" function ends, the pointer it created is destroyed from the stack its inside of, then the struct member should be pointing to garbage data right?

When get_string() returns item ends its lifetime. But a char* is returned to the caller. That pointer points to a static string literal "Alfred Hitchcock" that will always be there.

When create_default_person() ends value and new_person reaches their lifetimes also. But a pointer is returned to the caller. That pointer points to a valid Person struct and inside them there is a pointer to the static data declared inside get_string(), the string literalused to initialize item.

about the additional question

This program x.c

char* alfred = "alfred";
int main(void){ return 0; }

generates this assembly code

.LC0:
        .string "alfred"
alfred:
        .quad   .LC0
main:
        push    rbp
        mov     rbp, rsp
        mov     eax, 0
        pop     rbp
        ret

When an executable file x.exe is generated from this code, data for the instructions and the string are for sure recorded into x.exe. When you run the program, x.exe is loaded into memory and includes an area, a static one, with the string, as you can see in a dump from the executable file:

0010760    0000    0000    0000    0000    0000    0000    0000    0000
         \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0011020    6c61    7266    6465    0000    0000    0000    0000    0000
          a   l   f   r   e   d  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0011040    ffff    ffff    ffff    ffff    ffff    ffff    ffff    ffff
        377 377 377 377 377 377 377 377 377 377 377 377 377 377 377 377
0011060    0140    0000    0000    0000    0000    0000    0000    0000
          @ 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

and you see at address 11020 an expected alfred string. When loaded in memory the string will have a fixed static address relative to the beginning of the address the code was loaded to run by the OS, and that will be the one you can access using &alfred[0].

changing the example

#include <stdio.h>
#include <string.h>
const char* alfred = "alfred";

int main(void)
{
    // alfred[0] = 'A'; running this crashes the program
    //                  as the area can not be changed.
    //                  Declaring it 'const' makes the
    //                  compiler work for you and do not
    //                  allow the program to run and crash
    printf(
        "    \"alfred\" is at 0x%X. Last byte is %d\n",
         &alfred[0], alfred[strlen(alfred)]);
    const char* p = alfred;
    printf(
        "    pointer points to 0x%X, value there is "
        "\"%s\" Last byte is %d\n",
        p, p, *(p + strlen(p)));
    return 0;
}

Here the code:

  • prints the address where the string was loaded, and shows that there is a null as the last byte.
  • declare a pointer to this address and use it to print the same things. On the second printf you see pointer arithmetic in C as an alternative to get the last byte of the string
  • note that the area is declared as const char* and undestand that this is good for you, so the compiler will not let you go wrong and trying to change a static literal, instead if just crashing your code later
arfneto
  • 1,227
  • 1
  • 6
  • 13