Does garbage collection happen when we initialize a char array with a string literal in c?

Question

When we write the following line of code in C,

      char local_arr[] = "I am here";

the literal "I am here" gets stored in the read only part of the memory(say RM). How I visualize it is that it gets stored contiguously in the RM (Is that right?). Then the array local_arr (i.e local array) copies this array index by index from its location in RM.

But what happens to the literal after the local_array copies it? Is it lost thereby causing memory leaks? Or is there some sort of garbage collector like in Java that cleans up unreferenced objects?

For example if i write a piece of code as follows :

for(int i=0;i<100000;i++)
    char local[] = "I am wasting memory";

would I not run out of memory? Will each iteration create a new instance of identical literals with each iteration? Or will they all refer to the same literal since the value of the literal everytime is same?

Does RM belong to the heap memory or a specialized segment in heap?

Also the local array is stored in the stack, right? What if I use a dynamic array or global array. What happens then?

Similar to [C, char type memory](http://stackoverflow.com/questions/24006529/c-char-type-memory?lq=1) and [Pointers To Const Char](http://stackoverflow.com/questions/18003537/pointers-to-const-char) — Shafik Yaghmour, Jun 03 '14 at 15:10
C has no GC. Some people like to pretend you could implement one, but you can't do so reliably and transparently. The best you can get is a "conservative" GC, unless your program can somehow tell the allocator every time you copy a pointer somewhere. Otherwise you run smack into the halting problem. — cHao, Jun 03 '14 at 15:10
The initializer is stored *somewhere*, but it's not for you to care about where. It might be right in the text segment. — Kerrek SB, Jun 03 '14 at 15:10
@ShafikYaghmour there are other parts to question that is not similar to what you pointed out. — Dubby, Jun 03 '14 at 15:12
And it might be hacked in parts, which are stored as part of the instruction initializing the local variable. — Deduplicator, Jun 03 '14 at 15:12
The unnamed object created by the string literal exists throughout the duration of the program. — pmg, Jun 03 '14 at 15:12
No, @Deduplicator: a string literal is an array of char; it can't "be hacked in parts" — pmg, Jun 03 '14 at 15:13
@pmg: The string literal is used as an array initializer, not as an anonymous object in its own right. If it is more efficient to do so, it will be saved as an anonymous object. Otherwise it won't. — Deduplicator, Jun 03 '14 at 15:14
At my first sight I though it is very easy to answer this. Now realizing not that much! — haccks, Jun 03 '14 at 15:16
@pmg: Does that mean that C can't optimize away the array? If the only instance of a literal is in the initialization of an array, is it not allowed to say, for example, that `char x[] = "Hello!!";` translates to `push '\0!!o'`, `push 'lleH'`? — cHao, Jun 03 '14 at 15:17
The compiler is allowed to do anything as long as it adheres to the "as if" rule — pmg, Jun 03 '14 at 15:19
@Deduplicator does that mean that the compiler will decide if the literal is worth storing or discarding once the array initialization is done? — Dubby, Jun 03 '14 at 15:21
@haccks I think your answer was right and totally what the questioner was asking for… Maybe I overlooked something, but I think the main problem is that Dubby isn't aware that a `"foo"` construct isn't a string constant when used in initializers, just as you mentioned. — mafso, Jun 03 '14 at 15:21
@Dubby Even if you think about `"foo"` being a string literal (what it isn't in the terms of the C standard) there is _never_ a need to keep the "original" (uncopied) version of the array. You couldn't access it anyway. — mafso, Jun 03 '14 at 15:23
@mafso but if the "foo" construct isn't a literal then where is it stored? Or is it temporarily created and deleted after use? — Dubby, Jun 03 '14 at 15:31

Deduplicator · Accepted Answer · 2014-06-04T14:14:10.083

C does not have garbage collection, so if you forget to deallocate allocated memory with the proper deallocator, you get a memory leak.
While sometimes a conservative garbage collector like the Boehm collector is used, that causes lots of extra headaches.

Now, there are four types of memory in C:

static memory: This is valid from start to end. It comes in flavors logically read-only (writing is Undefined Behavior) and writeable.
thread-local memory: Similar to static memory, but distinct for each thread. This is new-fangled stuff, like all threading support.
automatic memory: Everything on the stack. It is automatically freed by leaving the block.
dynamic memory: What malloc, calloc, realloc and the like return on request. Do not forget to free resp. using another appropriate deallocator.

Your example uses automatic memory for local_arr and leaves the implementation free to initialize it to the provided literal whichever way is most efficient.

char local_arr[] = "I am here";

That can mean, inter alia:

Using memcpy/strcpy and putting the literal into static memory.
Constructing the array on the stack by pushing the parts, thus putting it into the executed instructions.
Anything else deemed opportune.

Also of interest, C constant literals have no identity, so can share space.
Anyway, using the as-if rule, many times static (and even dynamic / automatic) variables can be optimized away.

mafso · Answer 2 · 2014-06-03T16:04:56.227

Not an answer (Deduplicator already has given a good one, I think), but maybe this'll illustrate your problem…

Consider the following C code:

#include <stdio.h>

int main() {
    char foo[] = "012";
    /* I just do something with the array to not let the compiler
     * optimize it out entirely */
    for(char *p=foo; *p; ++p) {
        putchar(*p);
    }
    putchar('\n');
    return 0;
}

with the assembler output (with GCC on my machine):

[...]
.LC0:
    .string "012"
[...]
main:
[...]
    movl    .LC0(%rip), %edi

where you have a string in read-only memory (and that string will persist from program startup until exit). When I change the line initializing foo to

    char foo[] = "0123";

GCC thinks it's worth doing it this way:

    movl    $858927408, (%rsp)  # write 858927408 long (4 bytes) to where the stack pointer points to
    movb    $0, 4(%rsp)         # write a 0 byte to the position 4 bytes after where the stack pointer points to

858927408 is 0x33323130 (0x30 is the ASCII code for '0', 0x31 for '1' and so on); in the latter case the string isn't stored at read-only memory, it is stored in the instructions itself. In both cases, the array you eventually access is always on the stack. And you never have the ability to access the string literal in read-only memory in such a case, even if it exists.

HTH

Edenia · Answer 3 · 2014-06-03T15:36:01.857

0

Arrays are stored in cml (i.e contiguous memory locations) depending of their scope type. For example global (static) arrays would be saved in the Block Started by Symbol (bbs) which is a part of the data segment, while local are created as a stack in the computer memory. It is a string, because each element from the array points to its next, forming sequence of characters, which forms the string. editing according to the new changes from the question Doing so:

char str[] = "Hello World";

You do:

char str[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0'};

Since the last character is '\0' / NULL / 0 You don't fill information into the last segnificent block of memory, where the data type is stored. In that case, you will terminate the string and you won't receive leaks. Thats just how C handles char arrays and especially strings. They are null-terminated strings. A lot of functions like strlen works only if there is a null terminator.

Also if you use dynamicly created arrays they will be stored in the heap. As i know the heap is nothing much, basically it provides an environment for allocation and manages the memory for that purpose.

edited Jun 03 '14 at 15:36

answered Jun 03 '14 at 15:16

Edenia

2,312
1
16
33

You forgot the complexities introduced by that literal being used as an initializer to an array. – Deduplicator Jun 03 '14 at 15:17
Nearly this whole answer is implementation specific, and the parts that aren't don't really answer the question. – cHao Jun 03 '14 at 15:23
The question was changed, and there are actually like 5-6 questions figuring in here. – Edenia Jun 03 '14 at 15:28
So if I store char str[] = {'w','o'}; then the object persists in read only memory and appending it with NULL removes it after initialization? Or am I misinterpreting the second last para of your answer? – Dubby Jun 03 '14 at 15:29
The null terminator tells where is the end of the string. If the end is unknown it will result in u/b. But mostly it will cause leaks, you would have access to characters from the entire memory in the same ns – Edenia Jun 03 '14 at 15:33
Note, the UB related to the lack of a NUL at the end, is entirely related to the array's status as a "NUL-terminated byte string" (NTBS, or "string" for short :P), not a hard-set rule for all arrays of `char` (or arrays would automagically append the NUL as well). You can legitimately say `char c[] = { 'a', 'b', 'c' };`, and it'll work *as an array*. But functions that expect a NTBS (including nearly all string functions) may(!) choke on it, since by definition a NTBS includes the NUL at the end. – cHao Jun 03 '14 at 16:12
I am able to say what happans in most of the cases, not to say what happans in any particular case. – Edenia Jun 03 '14 at 16:14

frodeborli · Answer 4 · 2021-03-02T15:20:05.200

Surprise: The loop you wrote DOES NOTHING!

The loop you wrote will do NOTHING, not a single instruction you wrote will actually happen if you run that program after a modern good compiler finishes processing it.

TLDR: If you can't access it later, then you either have a memory leak - or you have overwritten the data. Since you're not using pointers in your code, the data is stored on your call stack - where every variable inside a function call has a fixed permanent slot of memory designated. This means that NO - it won't eat up all the RAM. This might not be true for all programming languages though.

Most people skip a lot of basic steps that will help you understand why things are the way they are, and how computers really work. Just a little amount of knowledge may get you a LONG way into becoming an amazing programmer.

I got a feeling that you are one of those that are curious enough, so I decided to write a long answer...

I've been lucky enough to have been born at around the perfect time to learn how they work, under the hood. People today are NOT so fortunate, for the "banalities" of computers are being sealed behind single waterproofed pieces of magnesium covered by touchable glass screens.

They aren't really banal, they are really an amazing engineering achievement. But after much trial and error, engineers and researches have arrived at something that is quite simple to understand.

This simplicity gives great power to those who have it, when they hide it from others. Everything is simple, once you understand it. If it isn't simple, it won't succeed. That's why most things that still exist, are quite simple to understand. :)

Compiling

When your source code gets compiled, the result is a "bunch of bytes"/blob/char-array whatever you want to call it. I'll call it the source[].

First a little background, which you may choose to skip to the "Code Memory and Data Memory" section below. :)

Target architecture

In the "old days", CPUs did not have an MMU device - so there is no "read only" RAM. However, some computers distinguish between code memory and data memory - noteworthy there is the Harvard architecture, and the von Neumann architecture.

Harvard architecture

There is a lot to say about the Harvard architecture, so I suggest you read about it at https://en.wikipedia.org/wiki/Harvard_architecture, but in this context - the important thing is that code belongs in a memory range that can't be accessed by your program code.

I guess it was less "designed by careful consideration of various options", and more the result of natural evolution as computers were being invented; code memory was literally switches and punch cards...

They don't exist anymore...

But the modified Harvard architecture does, and it's really not necessary to understand the difference between that and the next architecture I mention below.

I don't think is worthwhile to participate in a discussion of whether or not modern computers are Harvard or von Neumann, because it is very clear that the benefits from the Harvard architecture are being "emulated" in the von Neumann computers. There is no clear distinction anymore.

von Neumann architecture

Most computers today is this type of architecture. Software can write to the code memory and to the data memory. There is nothing special about any memory address. But in computers, some software has powers that other software don't. Particularly, the KERNEL (drum roll).

In more modern CPU designs, it is possible to virtualize memory addresses. This was previously a special component called an MMU (Memory Management Unit). Whenever the CPU wanted to access a memory address, the MMU would translate that address request to a different virtualized address. Today, I suspect the MMU is internal in the CPUs - but I will talk about the concept as if there is an MMU still.

The MMU is the magic little chip that makes your program believe it has a continuous sequence of addressable memory - so it make your program very simple to understand, which makes it simple for me to explain it. It was more difficult for programmers when I was a teenager in the 90's and I was (or felt like) the only one in my city that had heard about the internet.

Usually, this translation of memory addresses works on 4 KB (or so) chunks of memory called "pages". The page size is a topic for discussion and probably varies. If you chose larger page sizes, less memory is taken for metadata and lookup tables for these memory pages.

For every page that is allocated, the kernel will tell the MMU to tag it with a 'process owner ID', a 'is swapped to disk', an 'is shared' flag, a 'executable' flag, a 'read only' flag and an actual physical memory address. It might not be exactly these particular tags, but I wanted to illustrate the capabilities that the computer has regarding managing memory addresses.

If a program attempts to access a memory address that is swapped to disk, the MMU will put some electricity on a pin connected to the CPU. When the CPU feels the electric jolt from that, it immediately squirts out all the data that is stored internally in its registers and starts processing instructions somewhere in the kernel. This is what an interrupt is, under the hood. It's nothing magical. It's just something that causes the CPU to jump to some code somewhere else, while at the same time ensuring that the kernel can jump back again - pretending nothing happened. We call it multitasking.

"Unfortunately", I know a lot of stuff about computers, so I have a tendency to interrupt myself to squirt out more side notes. Maybe I'm that guy that constantly blurts out Did you know that...., while most people roll their eyes. Not because they knew what I was about to say, but because most people don't care - they just accept how things are and move on with what they care about. In my experience, understanding things is more valuable than knowing things.

A side side note: On iOS-devices, code memory is automatically tagged as read-only executable, and everything else is writable and not executable. This makes the OS inherently much less vulnerable to many forms of attacks - but it also makes it impossible to bring your own advanced functionality like jitting. This means you are forced to use the Apple provided technologies, instead of using third party features that depend on jitting; fast javascript engines, fast scripting languages, regular expression matching, bytecode based programming languages such as java and .NET.

So, Android lovers like to attack iPhone lovers, saying that their phone is much more customizable. But you now understand that there are technical arguments to be made for both choices.

Do you want to have the ability to put a flappy bird game on the start screen, or do you want your mobile device developer to prioritize security first and over time play catch up copying the best ideas from Android?

Code Memory and Data Memory

Code memory is simply a range of memory, and data memory is also a range of memory. Most of the time there is no way to distinguish this. When you allocate memory, you get a pointer to an address of apparently continuous memory (which is mapped by an MMU).

The important lesson is: In some operating systems, code memory is not writable, and data memory is not executable. In other systems, the application decides which of its allocated memory is executable, and all of it is writable. Finally, there are systems where the entire computer memory is writable.

Loading your program

When the OS kernel receives a call to execute source[], these are the most important things that happens as far as you are concerned:

The source[] is placed somewhere in RAM.
The kernel tags the memory pages it allocated for your program as executable and records some other metadata which it will later use to switch between your program and other processes in the system.
The kernel tells the MMU to enable all memory pages that belongs to your process.
The kernel sets a special "timeout interrupt" in the CPU, which ensures that after a certain slice of time, the CPU will jump to some code in the kernel memory.
The kernel updates the "program counter" register ("PC") in the CPU, which holds the memory address of the next instruction to evaluate, so that it points to wherever source[0] is located.

The string "I am here" is part of your source[]. You can probably find it back somewhere around source[50] or so. The last byte of that string will be \0 - a null byte. After that, you'll find more CPU instructions that came from your program.

Now you see why it is so dangerous to write a string into memory, without checking that it isn't longer than the allocated string? If somebody provided you with a string that has instructions, those instructions might get executed. Which is why I prefer the Apple/iOS way of better safe than sorry, and I would prefer this memory to be read-only - OR to use managed code like Dalvik, but that doesn't help in the Android case since it allows native binaries as well.

Source code is just bytes, also any strings in the source

In your source example:

    for(int i=0;i<100000;i++)
        char local[] = "I am wasting memory";

The source code will be stored somewhere in RAM, as bytes of data. They are not stored in any particular "string" form. You can read them as char or uint8_t or even float64 values - depending on the struct you use when pointing to that memory address.

The first few bytes of your binary file is some boiler plate code from the C compiler that manages a few things like the function stack.

The Stack

When the CPU starts reading instructions from your program, these first few bytes will malloc a range of memory which is set aside and we refer to it as the stack.

The stack can be thought of as a linked list of structs.

Each function in your program has a hidden struct that represents the local variables you're using inside the function. So when a function call is being performed, that struct is appended to the linked list. In your case:

    /* the "secret function struct" */
    struct theSecretStructForYourFunction {
        int i;          // 8 bytes goes here (for example)
        char local[];   // 8 bytes goes here
    }
    const theSecretMemoryOffsetForYourFunction = 123;

    for(int i=0;i<100000;i++)
        char local[] = "I am wasting memory";

Running your program

When you run your program, the first stack frame is the "global" scope. This first frame contains any variables that have been declared outside of any functions. You can just as easily think of it as just another function - except it doesn't have a name.

Invoking a function

So when your function is invoked, a special offset_to_the_of_the_stack value is incremented by 8 (because that's whats needed according to theSecretStructForYourFunction). Remember that the program has already malloc'ed a chunk for your stack.

The structs you define in a C-program are NOT compiled into the program. They are simply lookup information for the compiler, so that it knows how the file should be compiled. For example, if you have an array of structs that totals to 8 bytes, then it knows that you need to multiply the offset by 8 whenever you want to access an arbitrary index of that array. That's why it is helpful to have a .h file when we want to use a library from third parties.

Processing the function and NOT consuming all the RAM

Now the CPU starts processing your loop - looking up the i value directly from the stack, and also the local[] value directly from the stack.

For every step of the loop:

If NOT my_local_stack->i < 100000, jump over the next three instructions.
Write the address of the first character in "I am wasting memory" to my_local_stack->local[].
my_local_stack->i++
jmp (address of step 1)

Conclusion

This won't consume any more memory. In fact, a good compiler will probably rewrite your program in two steps:

    for(int i=0;i<100000;i++)
        char local[] = "I am wasting memory";

becomes

    char local[] = "I am wasting memory";
    for(int i=0;i<100000;i++);

which becomes:

    char local[] = "I am wasting memory";
    int i=100000;

which is finally compiled to source code that DOES NOTHING.

char source[] = [ 'I',' ','a','m',' ','w','a','s','t','i','n','g', ' ','m','e','m','o','r','y, 0x1, 0x86, 0xA0 ]`

Sentimental · Answer 5 · 2014-06-03T15:22:25.623

String Literals are stored in static area. When you copy string literals to a local variable, there will be two copies: static area and stack. The copy in static area will not be deleted. There is no GC in C. But if you use a pointer in a function, you can access the string.

#include <stdio.h>

char *returnStr()
{
    char *p="hello world!";
    return p;
}

char *returnStr2()
{
    char p[]="hello world!";
    return p;
}
int main()
{
    char *str=NULL;
    char *str2=NULL;
    str=returnStr();
    str2 = returnStr2();
    printf("%s\n", str);
    printf("%s\n", str2);
    getchar();
    return 0;
}

So in the first function, it will print string because it uses a pointer. In the second function, the string in stack will be deleted so it will print garbage.

cHao · Answer 6 · 2014-06-15T18:59:01.213

The program doesn't create a new string each time the loop hits it. There's only one string, which already exists, and the literal simply refers to that array.

What happens is that when the compiler sees a regular string literal, it creates^* a static array of char (C11 §6.4.5/6, C99 §6.4.5/5) containing the string's contents, and adds the array (or code to create it) to its output.^*

The only allocation that happens in the function is with char local_arr[] =..., which allocates enough space for a copy of the string's contents. Since it's a local, it is effectively released when control leaves the block that defined it. And due to the way most compilers implement automatic storage (even for arrays), it basically can't leak.

^{^* (Each literal might end up in its own array. Or, identical string literals might refer to the same array. Or, in some cases, the array might even be eliminated entirely. But that's all implementation-specific and/or optimization-related stuff, and is irrelevant for most well-defined programs.)}