0

I have played a little bit with C and written the following code:

#include<stdio.h>
#include<stdlib.h>
int main() {
    char* value = malloc(5 * sizeof(char));
    int vect[3];
    printf("%d\n", value[135151]);
    int i, count = 0;
    for(i = 0; i < 135152; i++) {
        if(value[i]) {
            count++;
            printf("position is %d, value is %d and i change it with 42\n", i, value[i]);
            value[i] = 42;
            vect[count - 1] = i;
        }
    }
    printf("count is %d\n", count);

    printf("pointer is at location %p\n", value);
    printf("changed values are %d %d %d\n", value[vect[0]], value[vect[1]],
                                                        value[vect[2]]);

    return 0;
}

After several tries, on my laptop, I have found out that if I print value[135152] I get segfault, and if I print value[135151] I get 0 at stdout.

After that, I was curious to find if there are nonzero values in this interval, and 3 nonzero values where shown.

After that, I tried to modify them all to be 42 (I forgot to mention that at many program executions, 20+, even if the vector value was shown at a different location, such as 0xbe7010 or 0x828010, the same nonzero values at the same position remained, which made me understand that the pointer address is virtual (but the location is the same)).

After, I have modified those values, I printed them in the end just to be sure, and they showed 42 all 3 of them. But, at another program execution, the previous values were shown, just like I hadn't modify that memory zone.

I will give you 3 consecutive outputs of mine:

0
position is 24, value is -31 and i change it with 42
position is 25, value is 15 and i change it with 42
position is 26, value is 2 and i change it with 42
count is 3
pointer is at location 0x21bb010
changed values are 42 42 42


0
position is 24, value is -31 and i change it with 42
position is 25, value is 15 and i change it with 42
position is 26, value is 2 and i change it with 42
count is 3
pointer is at location 0x20d1010
changed values are 42 42 42


0
position is 24, value is -31 and i change it with 42
position is 25, value is 15 and i change it with 42
position is 26, value is 2 and i change it with 42
count is 3
pointer is at location 0x19d0010
changed values are 42 42 42

Could you please tell me why those values persist even after changing?

And also, why is the pointer address changing, but memory zone is the same? (I suspect there is a bijective function between the physical and virtual memory in C that changes every time I execute the program).

Thank you for your help and sorry for this Wall of Text!

Emanuel
  • 1,333
  • 1
  • 10
  • 11
  • the code should always check the returned value from malloc (and family) to assure the operation was successful – user3629249 Jan 04 '15 at 00:14
  • this line: 'printf("%d\n", value[135151]);' is accessing memory outside the allocated area. This results in undefined behaviour which can lead to a seg fault event. – user3629249 Jan 04 '15 at 00:17
  • this line: 'if(value[i]) {' when 'i' is greater than 4, this results in undefined behaviour which can lead to a seg fault event. What happens when undefined behaviour is performed is 'undefined'. It could be anything, including the behaviour you have seen. Note: there are two more instances of undefined behaviour in this same loop. Rather than worrying about some random event being repeatable, far better to fix the program. – user3629249 Jan 04 '15 at 00:21
  • 1
    Are you asking with in the context of standard C (in which case the answer is "undefined behaviour means weird things can happen") or in the context of the whole system ("on this particular operating system version, the location of the heap is randomized, but everything after that is deterministic and always allocates stuff in the same place relative to the heap, so the other code wrote stuff in the same place relative to your allocation ...") – user253751 Jan 04 '15 at 04:13

2 Answers2

3

If I understand your question correctly, the answer is quite a bit simpler than you are thinking. On each program execution, malloc allocates memory by asking for it from the operating system, but neither malloc nor the OS have any particular reason to give you the same space in memory each time your program executes. As far as I know, if you aren't running your program as a kernel level program there really isn't any way to ensure you get the same memory address each time, and even in that case you likely won't have the same value because some other program may have written to the memory in the meantime (or the computer was shut off, since malloc allocates memory from volatile memory). What you are looking for is possibly saving your array to a file and reading from that file on program execution.

Almost everything you are doing is undefined, so I cannot give you an adequate answer to the question of why it is happening. Here are some of the different kinds of undefined behavior you are invoking:

printf("%d\n", value[135151]);

Reading past the end of your array, or more formally dereferencing any pointer whose offset is larger than the size of your array (value[5] or higher) is undefined behavior. I have made many programs were even dereferencing that causes a segfault, but in this case it appears your Operating System or initialization libraries (the stuff that the program runs before main) are allocating your program a bunch of memory without you having to ask.

if(value[i])

The value in a variable or memory space which has not been assigned a value by your program is undefined. It would be perfectly legitimate for all of the memory to be zero, or for it to have whatever value happened to be there from before. Going to your question about reading such memory, it's clearly being assigned specific values by your operating system either after it is released or before it is allocated to your program. One important reason the OS might do this is security - if a program gets the value entered into memory from the program before it, it can read that other program's data, which would be very bad if the previous program was converting plaintext passwords to hashes, for example.

value[i] = 42;

This probably goes without saying, but assigning a value to a memory location which was not allocated to you is also undefined behavior.

EDIT: In response to comment: Undefined behavior means that the standard doesn't define what happens when you do it. Obviously if a program compiles and runs, it must exhibit some behavior, but undefined behavior may be totally different depending on the compiler, standard library version, operating system, and a variety of other factors. In your case, all the variables come together to result in the behavior you are seeing, but without picking apart every detail about your compiler, environment, hardware, etc, we cannot tell you why, and more importantly, that behavior may (and likely would be) totally different if, for example, I compiled and ran this code using a different operating system and compiler.

As a side note, I tried this on a few compilers and got exactly the same results on clang 3.5.0 on Linux Red Hat and similar with gcc, so my best guess is that the information has something to do with malloc's implementation, possibly metadata for free to use when deallocating the memory.

IllusiveBrian
  • 3,105
  • 2
  • 14
  • 17
  • Why are the printed values `-31 15 2` every time? – Adrian Jan 04 '15 at 00:05
  • At each execution program, and I did like 20, I would get the values on the same positions at the same values (I mean the nonzero ones), which made me think, and I still think, that C in this case gave me the same memory zone at each execution. So, in order to exploit this, I wanted to overwrite the nonzero values to other values (thinking maybe it would allocate me another memory zone). But, surprise! it allocated the same memory area to me, and even more, the values which I had overwritten remained at their previous values. – Emanuel Jan 04 '15 at 00:08
  • 3
    Almost everything you are doing is undefined, so I cannot give you an adequate answer to that question. I'll try to go through each instance of undefined behavior in my answer for you. – IllusiveBrian Jan 04 '15 at 00:11
  • Previously, I have allocated 2 char arrays using malloc, one of size 3 and one of size 4. I have discovered that vect1[32] was vect2[0], because I assigned vect2[0] a nonzero value, and went from char to char starting vect1[0] until I have figured that out. After I changed vect1[32] to another value, vect2[0] was changed too. Is that undefined behaviour as well? I still can't understand how undefined behaviour (going back to the problem of the code I posted here) can cause the same values at the same position the maintain the same value (too much of a coincidence in my opinion). – Emanuel Jan 04 '15 at 00:32
  • I think you should make the "almost everything you're doing is undefined behaviour" comment the leading paragraph in your answer. Any speculation about what does and does not happen is coincidental; the behaviour _is_ undefined and anything is possible (including seeming to work). (You might look at my [answer](http://stackoverflow.com/questions/27757562/27758484#27758484) to another question to see what I said about a less egregious case of invoking undefined behaviour (but behaviour is either undefined or defined — or, occasionally, unspecified — and degree doesn't really matter). – Jonathan Leffler Jan 04 '15 at 00:58
  • @JonathanLeffler Not a bad idea. – IllusiveBrian Jan 04 '15 at 01:03
  • By metadata, you mean something like the actual allocating address was not 5 but instead at least 26? (because the position of those values that could not be changed was 24,25,26). By the same results, you mean it happened just like it happened to me? Then this may be the answer, combined with immibis' answer. – Emanuel Jan 04 '15 at 14:21
1

I'll assume you aren't satisfied with "undefined behaviour means anything can happen", and want to know things that are not defined by C itself. Because it's not defined by C, the following is partially speculation:

Address-Space Layout Randomization, or ASLR, is a feature of most modern operating systems. It randomizes the starting address of each major memory area in your program. Its purpose is to make certain types of security vulnerabilities harder to exploit. That's not in the scope of this question.

Also, there is code that runs before main. It initializes various things within the standard library. When I say "your program", I am including this code.

Because of ASLR, the heap will start at a different address in every process. However, because your program doesn't use any other randomness (it is deterministic apart from ASLR), it always allocates memory blocks with the same size and in the same order. Since malloc is not random - apart from the starting address of the heap - on your operating system, it allocates memory in the same "pattern". Perhaps your memory block is always at (start_of_heap + 123400) and something else always gets a memory block at (start_of_heap + 123424) - in this case, the other memory block is always (your_memory_block + 24), even though the exact address varies.

What is this other memory block? It's not possible for me to guess that accurately - but given that your program doesn't crash, it's likely that your program never uses again. It might be important for some feature you aren't using, or it might be book-keeping information for the memory allocator (which never sees your overwritten values, since you never call malloc or free after that).

P.S. Overwriting memory that's not yours is a great way to cause unpredictable crashes that you can't figure out. You should seriously avoid doing this in any real programs. It's also a great way to write programs that aren't portable - maybe OSX stores something more important at that location, or maybe it doesn't even allocate that location (so you segfault when accessing it).

user253751
  • 57,427
  • 7
  • 48
  • 90
  • I have done this program in order to learn some things about C, so these "mistakes" were done on purpose for the curiosity of what happens after. So, basically, you mean those values that were given to me, which I tried to modify and didn't work, I mean the values on the positions 24, 25 and 26 could have somehow be used by the memory allocator. If it so, why doesn't C have some rules that states that everything not allocated should give segfault when accessed? I have said earlier that I was able to modify a char* vect2, using char* vect1, by figuring out that vect1[32]==vect2[0] which is bad. – Emanuel Jan 04 '15 at 14:18
  • 1
    @Emanuel Part of C's design philosophy was not to restrict programmers from doing "bad" things, on the corner case that they had done their homework and knew what they were doing. In Java, for example, if you allocate an array and then try to read past its last index, it will always throw an exception. In this case, it is also impossible for a compiler to know exactly how much space you had allocated for your array, since that information is stored in a location determined by `malloc` at run-time and may be determined at run-time. – IllusiveBrian Jan 04 '15 at 15:34
  • So in Java the allocation is done at compilation (which helps determine if I read past its index, but makes the program slower as consequence), while in C it's done at run-time, mostly for speed considerations? Also, since there is some metadata to help "free" function know how much space to free, couldn't that information be used to determine if I read or write on an unallocated index? – Emanuel Jan 04 '15 at 16:32
  • 1
    @Emanuel You don't get segfaults from accessing memory that isn't yours because how would the CPU know whether you're supposed to be accessing that memory? Another part of C's design philosophy is to be lightweight - When you write `value[135131]`, it gets compiled into something like `address = (variable "value") + 135131; read memory at "address"`. Both of those are perfectly valid operations - the first one is just addition, and the second one is a memory access. – user253751 Jan 04 '15 at 23:02
  • 1
    @Emanuel It is theoretically possible for a compiler to crash whenever you access outside the bounds of an object, but I don't know of any compilers that do. Mostly because it would make all programs a few times slower, and waste memory as well. – user253751 Jan 04 '15 at 23:08