Surprise: The loop you wrote DOES NOTHING!
The loop you wrote will do NOTHING, not a single instruction you wrote will actually happen if you run that program after a modern good compiler finishes processing it.
TLDR: If you can't access it later, then you either have a memory leak - or you have overwritten the data. Since you're not using pointers in your code, the data is stored on your call stack - where every variable inside a function call has a fixed permanent slot of memory designated. This means that NO - it won't eat up all the RAM. This might not be true for all programming languages though.
Most people skip a lot of basic steps that will help you understand why things are the way they are, and how computers really work. Just a little amount of knowledge may get you a LONG way into becoming an amazing programmer.
I got a feeling that you are one of those that are curious enough, so I decided to write a long answer...
I've been lucky enough to have been born at around the perfect time to learn how they work, under the hood. People today are NOT so fortunate, for the "banalities" of computers are being sealed behind single waterproofed pieces of magnesium covered by touchable glass screens.
They aren't really banal, they are really an amazing engineering achievement. But after much trial and error, engineers and researches have arrived at something that is quite simple to understand.
This simplicity gives great power to those who have it, when they hide it from others. Everything is simple, once you understand it. If it isn't simple, it won't succeed. That's why most things that still exist, are quite simple to understand. :)
Compiling
When your source code gets compiled, the result is a "bunch of bytes"/blob/char-array whatever you want to call it. I'll call it the source[]
.
First a little background, which you may choose to skip to the "Code Memory and Data Memory" section below. :)
Target architecture
In the "old days", CPUs did not have an MMU device - so there is no "read only" RAM. However, some computers distinguish between code memory and data memory - noteworthy there is the Harvard architecture, and the von Neumann architecture.
Harvard architecture
There is a lot to say about the Harvard architecture, so I suggest you read about it at https://en.wikipedia.org/wiki/Harvard_architecture, but in this context - the important thing is that code belongs in a memory range that can't be accessed by your program code.
I guess it was less "designed by careful consideration of various options", and more the result of natural evolution as computers were being invented; code memory was literally switches and punch cards...
They don't exist anymore...
But the modified Harvard architecture does, and it's really not necessary to understand the difference between that and the next architecture I mention below.
I don't think is worthwhile to participate in a discussion of whether or not modern computers are Harvard or von Neumann, because it is very clear that the benefits from the Harvard architecture are being "emulated" in the von Neumann computers. There is no clear distinction anymore.
von Neumann architecture
Most computers today is this type of architecture. Software can write to the code memory and to the data memory. There is nothing special about any memory address. But in computers, some software has powers that other software don't. Particularly, the KERNEL
(drum roll).
In more modern CPU designs, it is possible to virtualize memory addresses. This was previously a special component called an MMU (Memory Management Unit). Whenever the CPU wanted to access a memory address, the MMU would translate that address request to a different virtualized address. Today, I suspect the MMU is internal in the CPUs - but I will talk about the concept as if there is an MMU still.
The MMU is the magic little chip that makes your program believe it has a continuous sequence of addressable memory - so it make your program very simple to understand, which makes it simple for me to explain it. It was more difficult for programmers when I was a teenager in the 90's and I was (or felt like) the only one in my city that had heard about the internet.
Usually, this translation of memory addresses works on 4 KB (or so) chunks of memory called "pages". The page size is a topic for discussion and probably varies. If you chose larger page sizes, less memory is taken for metadata and lookup tables for these memory pages.
For every page that is allocated, the kernel will tell the MMU to tag it with a 'process owner ID', a 'is swapped to disk', an 'is shared' flag, a 'executable' flag, a 'read only' flag and an actual physical memory address. It might not be exactly these particular tags, but I wanted to illustrate the capabilities that the computer has regarding managing memory addresses.
If a program attempts to access a memory address that is swapped to disk, the MMU will put some electricity on a pin connected to the CPU. When the CPU feels the electric jolt from that, it immediately squirts out all the data that is stored internally in its registers and starts processing instructions somewhere in the kernel. This is what an interrupt
is, under the hood. It's nothing magical. It's just something that causes the CPU to jump to some code somewhere else, while at the same time ensuring that the kernel can jump back again - pretending nothing happened. We call it multitasking.
"Unfortunately", I know a lot of stuff about computers, so I have a tendency to interrupt myself to squirt out more side notes. Maybe I'm that guy that constantly blurts out Did you know that...., while most people roll their eyes. Not because they knew what I was about to say, but because most people don't care - they just accept how things are and move on with what they care about. In my experience, understanding things is more valuable than knowing things.
A side side note: On iOS-devices, code memory is automatically tagged as read-only executable, and everything else is writable and not executable. This makes the OS inherently much less vulnerable to many forms of attacks - but it also makes it impossible to bring your own advanced functionality like jitting. This means you are forced to use the Apple provided technologies, instead of using third party features that depend on jitting; fast javascript engines, fast scripting languages, regular expression matching, bytecode based programming languages such as java and .NET.
So, Android lovers like to attack iPhone lovers, saying that their phone is much more customizable. But you now understand that there are technical arguments to be made for both choices.
Do you want to have the ability to put a flappy bird game on the start screen, or do you want your mobile device developer to prioritize security first and over time play catch up copying the best ideas from Android?
Code Memory and Data Memory
Code memory is simply a range of memory, and data memory is also a range of memory. Most of the time there is no way to distinguish this. When you allocate memory, you get a pointer to an address of apparently continuous memory (which is mapped by an MMU).
The important lesson is: In some operating systems, code memory is not writable, and data memory is not executable. In other systems, the application decides which of its allocated memory is executable, and all of it is writable. Finally, there are systems where the entire computer memory is writable.
Loading your program
When the OS kernel receives a call to execute source[]
, these are the most important things that happens as far as you are concerned:
- The
source[]
is placed somewhere in RAM.
- The kernel tags the memory pages it allocated for your program as executable and records some other metadata which it will later use to switch between your program and other processes in the system.
- The kernel tells the MMU to enable all memory pages that belongs to your process.
- The kernel sets a special "timeout interrupt" in the CPU, which ensures that after a certain slice of time, the CPU will jump to some code in the kernel memory.
- The kernel updates the "program counter" register ("PC") in the CPU, which holds the memory address of the next instruction to evaluate, so that it points to wherever
source[0]
is located.
The string "I am here" is part of your source[]
. You can probably find it back somewhere around source[50]
or so. The last byte of that string will be \0
- a null byte. After that, you'll find more CPU instructions that came from your program.
Now you see why it is so dangerous to write a string into memory, without checking that it isn't longer than the allocated string? If somebody provided you with a string that has instructions, those instructions might get executed. Which is why I prefer the Apple/iOS way of better safe than sorry, and I would prefer this memory to be read-only - OR to use managed code like Dalvik, but that doesn't help in the Android case since it allows native binaries as well.
Source code is just bytes, also any strings in the source
In your source example:
for(int i=0;i<100000;i++)
char local[] = "I am wasting memory";
The source code will be stored somewhere in RAM, as bytes of data. They are not stored in any particular "string" form. You can read them as char
or uint8_t
or even float64
values - depending on the struct
you use when pointing to that memory address.
The first few bytes of your binary file is some boiler plate code from the C compiler that manages a few things like the function stack.
The Stack
When the CPU starts reading instructions from your program, these first few bytes will malloc
a range of memory which is set aside and we refer to it as the stack.
The stack can be thought of as a linked list of structs.
Each function in your program has a hidden struct
that represents the local variables you're using inside the function. So when a function call is being performed, that struct is appended to the linked list. In your case:
/* the "secret function struct" */
struct theSecretStructForYourFunction {
int i; // 8 bytes goes here (for example)
char local[]; // 8 bytes goes here
}
const theSecretMemoryOffsetForYourFunction = 123;
for(int i=0;i<100000;i++)
char local[] = "I am wasting memory";
Running your program
When you run your program, the first stack frame is the "global" scope. This first frame contains any variables that have been declared outside of any functions. You can just as easily think of it as just another function - except it doesn't have a name.
Invoking a function
So when your function is invoked, a special offset_to_the_of_the_stack
value is incremented by 8 (because that's whats needed according to theSecretStructForYourFunction
). Remember that the program has already malloc'ed a chunk for your stack.
The structs you define in a C-program are NOT compiled into the program. They are simply lookup information for the compiler, so that it knows how the file should be compiled. For example, if you have an array of structs that totals to 8 bytes, then it knows that you need to multiply the offset by 8 whenever you want to access an arbitrary index of that array. That's why it is helpful to have a .h
file when we want to use a library from third parties.
Processing the function and NOT consuming all the RAM
Now the CPU starts processing your loop - looking up the i
value directly from the stack, and also the local[]
value directly from the stack.
For every step of the loop:
- If NOT
my_local_stack->i < 100000
, jump over the next three instructions.
- Write the address of the first character in "I am wasting memory" to
my_local_stack->local[]
.
my_local_stack->i++
jmp (address of step 1)
Conclusion
This won't consume any more memory. In fact, a good compiler will probably rewrite your program in two steps:
for(int i=0;i<100000;i++)
char local[] = "I am wasting memory";
becomes
char local[] = "I am wasting memory";
for(int i=0;i<100000;i++);
which becomes:
char local[] = "I am wasting memory";
int i=100000;
which is finally compiled to source code that DOES NOTHING.
char source[] = [ 'I',' ','a','m',' ','w','a','s','t','i','n','g', ' ','m','e','m','o','r','y
, 0x1, 0x86, 0xA0 ]`