How dangerous is it to access an array out of bounds?

Question

How dangerous is accessing an array outside of its bounds (in C)? It can sometimes happen that I read from outside the array (I now understand I then access memory used by some other parts of my program or even beyond that) or I am trying to set a value to an index outside of the array. The program sometimes crashes, but sometimes just runs, only giving unexpected results.

Now what I would like to know is, how dangerous is this really? If it damages my program, it is not so bad. If on the other hand it breaks something outside my program, because I somehow managed to access some totally unrelated memory, then it is very bad, I imagine. I read a lot of 'anything can happen', 'segmentation might be the least bad problem', 'your hard disk might turn pink and unicorns might be singing under your window', which is all nice, but what is really the danger?

My questions:

Can reading values from way outside the array damage anything apart from my program? I would imagine just looking at things does not change anything, or would it for instance change the 'last time opened' attribute of a file I happened to reach?
Can setting values way out outside of the array damage anything apart from my program? From this Stack Overflow question I gather that it is possible to access any memory location, that there is no safety guarantee.
I now run my small programs from within XCode. Does that provide some extra protection around my program where it cannot reach outside its own memory? Can it harm XCode?
Any recommendations on how to run my inherently buggy code safely?

I use OSX 10.7, Xcode 4.6.

In general, the OS will protect itself and other processes from your malfeasance. It's not something you necessarily want to rely on heavily, though. — Hot Licks, Mar 26 '13 at 20:48
Also, you will never "happen to reach" a file on your hard disk when accessing and array index out of bounds (in your ram). — DrummerB, Mar 26 '13 at 20:49
i believe you are asking about C array, right? so thats have nothing to do with ObjC and don't really relate to any IDE. — Bryan Chen, Mar 26 '13 at 20:51
xlc : I did mention my IDE XCode in my question, because I was suspecting it might protect my system from my buggy programs if I run them inside XCode, and hoping for input from you guys on that idea. Sorry to have been unclear — ChrisD, Mar 26 '13 at 21:11
@DrummerB thanks for the compliment, I was a bit nervous about my first SO question — ChrisD, Mar 26 '13 at 21:23
Here's my [favorite example](http://www.youtube.com/watch?v=ucQI5HpiFrI&t=42m47s) of weird results (it deals with the stack, but I found it really enlightening...). — phipsgabler, Mar 26 '13 at 22:34
@phg that fragment is indeed exactly the dangers I am talking about, but then not just when accessing memory my own program is using (as in the Stanford video), but reaching beyond that. — ChrisD, Mar 27 '13 at 10:20

Keith Thompson · Answer 1 · 2014-07-22T23:09:37.567

135

As far as the ISO C standard (the official definition of the language) is concerned, accessing an array outside its bounds has "undefined behavior". The literal meaning of this is:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

A non-normative note expands on this:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

So that's the theory. What's the reality?

In the "best" case, you'll access some piece of memory that's either owned by your currently running program (which might cause your program to misbehave), or that's not owned by your currently running program (which will probably cause your program to crash with something like a segmentation fault). Or you might attempt to write to memory that your program owns, but that's marked read-only; this will probably also cause your program to crash.

That's assuming your program is running under an operating system that attempts to protect concurrently running processes from each other. If your code is running on the "bare metal", say if it's part of an OS kernel or an embedded system, then there is no such protection; your misbehaving code is what was supposed to provide that protection. In that case, the possibilities for damage are considerably greater, including, in some cases, physical damage to the hardware (or to things or people nearby).

Even in a protected OS environment, the protections aren't always 100%. There are operating system bugs that permit unprivileged programs to obtain root (administrative) access, for example. Even with ordinary user privileges, a malfunctioning program can consume excessive resources (CPU, memory, disk), possibly bringing down the entire system. A lot of malware (viruses, etc.) exploits buffer overruns to gain unauthorized access to the system.

(One historical example: I've heard that on some old systems with core memory, repeatedly accessing a single memory location in a tight loop could literally cause that chunk of memory to melt. Other possibilities include destroying a CRT display, and moving the read/write head of a disk drive with the harmonic frequency of the drive cabinet, causing it to walk across a table and fall onto the floor.)

And there's always Skynet to worry about.

The bottom line is this: if you could write a program to do something bad deliberately, it's at least theoretically possible that a buggy program could do the same thing accidentally.

In practice, it's very unlikely that your buggy program running on a MacOS X system is going to do anything more serious than crash. But it's not possible to completely prevent buggy code from doing really bad things.

edited Jul 22 '14 at 23:09

answered Mar 26 '13 at 21:20

Keith Thompson

254,901
44
429
631

1

thanks, I actually fully understand this. But it immediately triggers a follow-up question: what can a beginning programmer do, to protect his/her computer from his/her own possibly horrible creations? After I have tested a program thoroughly, I can unleash it in the world. But the first trial run is bound to be an incorrect program. How do you guys keep your systems safe from yourself? – ChrisD Mar 26 '13 at 21:27
7

@ChrisD: We tend to be lucky. 8-)} Seriously, OS-level protection is pretty good these days. Worst case, if I write an accidental [fork bomb](http://en.wikipedia.org/wiki/Fork_bomb), I might have to reboot to recover. But real damage to the system probably isn't worth worrying about, as long as your program isn't trying to do something on the edge of being dangerous. If you're really worried, running the program on a virtual machine might not be a bad idea. – Keith Thompson Mar 26 '13 at 21:32
1

On the other hand, I've seen lots of weird things happen on computers I've used (corrupted files, unrecoverable system errors, etc.), and I have no idea how many of them might have been caused by some C program exhibiting the dreaded undefined behavior. (So far no actual demons have flown out of my nose.) – Keith Thompson Mar 26 '13 at 21:33
1

thanks for teaching me fork bombs - I have done stuff close to that, when trying to grasp recursion :) – ChrisD Mar 26 '13 at 21:40
@ChrisD: Recursion within a program is likely to run into the stack size limit for your process. A fork bomb bypasses that limit by creating multiple processes, each of which doesn't use a lot of stack space. – Keith Thompson Mar 26 '13 at 21:42
Besides the rare security holes, all the threats in *nix are of Denial Of Service type: hog the CPU, fill RAM, fill the hard disk, send network PING requests, etc. You can set limits for your user accounts with e.g. `ulimit`, like max number of processes or disk quota. Most systems don't do it by default, I think. Even OpenBSD, which is supposed to be safe "out of the box", can be brought down with a simple C program by a normal user. – potrzebie Mar 26 '13 at 21:57
This answer is kind of missing something important: the compiler is allowed to do *anything* it wants in case an undefined behavior is executed. It can just delete the function for example. (Many other answers have the same problem as I noticed right now). – usr Apr 02 '13 at 21:35
@usr: That's implied by the "imposes no requirements" clause in the definition of "undefined behavior" -- but I'll probably update this answer to clarify the point. – Keith Thompson Apr 02 '13 at 21:38
2

http://www.scientificamerican.com/article/printers-can-be-hacked-to-catch-fire/ so fire is still possible with modern electronics. – Mooing Duck Jul 22 '14 at 23:19
1

Why is doing timed seeks on disk drives to make them walk a Bad Thing? We used to have races . . . – geometrian Oct 19 '14 at 07:42
Those historical behaviours, while amusing, have nothing to do with buffer overruns. – user253751 Sep 11 '15 at 12:11
1

@immibis: A buffer overrun could result in execution of arbitrary code. And as I wrote in my answer, if you could write a program to do something bad *deliberately*, it's at least theoretically possible that a buggy program could do the same thing *accidentally*. Melting core memory is admittedly unlikely on systems that don't have core memory, but hardware damage is still possible. – Keith Thompson Sep 15 '15 at 21:46
I was scared because I wrote a program to randomly modify bits. I feel secure now because I'm sure Arch Linux would have prevented the damage. – Confuse Nov 06 '16 at 08:22
Doesn't standard guarantee that access for 1 character (type of u/s char) beyond the right border is okay? – 0andriy Nov 28 '16 at 22:45
1

@AndyShevchenko: No. A pointer just past the end of an array is valid, but attempting to dereference it has undefined behavior. – Keith Thompson Nov 29 '16 at 04:55
@KeithThompson Yes, that's right in C++, but not in C as far as I understood standard. Basically it guarantees that `char s[4] = "abcd"` will work. – 0andriy Nov 29 '16 at 22:42
1

@AndyShevchenko: That's an entirely separate issue. `char s[4] = "abcd";` works in C, but not in C++, because C has a special case rule: the literal `"abcd"` is always logically 5 bytes long (including the terminating `'\0'`), but it can be used to initialize a 4-element `char` array, dropping the terminator. In both C and C++, you can legally compute a pointer just past the end of an array object, but attempting to dereference it has undefined behavior. – Keith Thompson Nov 29 '16 at 23:36
It may also be worth noting that UB isn't limited to cases where a pointer goes outside the range of an allocation. Given `int arr[2][8]`, an attempt to access `arr[0][8]` may malfunction even though the Standard defines the behavior of `arr[0]+8` as yielding an address that matches that of addressable object `arr[1][0]`. – supercat May 24 '21 at 16:11

trumpetlicks · Answer 2 · 2021-03-17T18:45:53.993

In general, Operating Systems of today (the popular ones anyway) run all applications in protected memory regions using a virtual memory manager. It turns out that it is not terribly EASY (per se) to simply read or write to a location that exists in REAL space outside the region(s) that have been assigned / allocated to your process.

Direct answers:

Reading will almost never directly damage another process, however it can indirectly damage a process if you happen to read a KEY value used to encrypt, decrypt, or validate a program / process. Reading out of bounds can have somewhat adverse / unexpected affects on your code if you are making decisions based on the data you are reading
The only way your could really DAMAGE something by writing to a loaction accessible by a memory address is if that memory address that you are writing to is actually a hardware register (a location that actually is not for data storage but for controlling some piece of hardware) not a RAM location. In all fact, you still wont normally damage something unless you are writing some one time programmable location that is not re-writable (or something of that nature).
Generally running from within the debugger runs the code in debug mode. Running in debug mode does TEND to (but not always) stop your code faster when you have done something considered out of practice or downright illegal.
Never use macros, use data structures that already have array index bounds checking built in, etc....

ADDITIONAL I should add that the above information is really only for systems using an operating system with memory protection windows. If writing code for an embedded system or even a system utilizing an operating system (real-time or other) that does not have memory protection windows (or virtual addressed windows) that one should practice a lot more caution in reading and writing to memory. Also in these cases SAFE and SECURE coding practices should always be employed to avoid security issues.

Safe and secure coding practices should **always** be employed. — Nik Bougalis, Mar 26 '13 at 21:05
I would suggest NOT using try/catch for buggy code unless you catch very specific exceptions and know how to recover from them. Catch(...) is the worst thing you can add to a buggy code. — Eugene, Mar 26 '13 at 21:05
@NikBougalis - I completely agree, but it is EVEN MORE IMPORTANT if the OS doesnt include memory protection / virtual address spaces, or there is a lack of OS :-) — trumpetlicks, Mar 26 '13 at 21:07
@Eugene - I have never noticed that to be a problem for me, but I agree with you, do I have edited it out :-) — trumpetlicks, Mar 26 '13 at 21:09
1) you mean damage because I would be revealing something that should have stayed secret? 2) I am not sure I get what you mean, but I guess I am only accessing RAM with trying to access locations outside of array bounds? — ChrisD, Mar 26 '13 at 21:13
@ChrisD, with virtual memory, accessing an array _in_ bounds could lead to accessing the hard drive. That's within the OS's control. — Shannon Severance, Apr 02 '13 at 20:59
Nobody asked or cares, but it's "per se", not "per say": from the Latin meaning "by itself". — josaphatv, Mar 16 '21 at 16:25

score 11 · Answer 3 · answered Mar 27 '13 at 11:50

11

Not checking bounds can lead to to ugly side effects, including security holes. One of the ugly ones is arbitrary code execution. In classical example: if you have an fixed size array, and use strcpy() to put a user-supplied string there, the user can give you a string that overflows the buffer and overwrites other memory locations, including code address where CPU should return when your function finishes.

Which means your user can send you a string that will cause your program to essentially call exec("/bin/sh"), which will turn it into shell, executing anything he wants on your system, including harvesting all your data and turning your machine into botnet node.

See Smashing The Stack For Fun And Profit for details on how this can be done.

answered Mar 27 '13 at 11:50

che

12,097
7
42
71

I know that I should not be accessing array elements beyond the bounds, thanks for reinforcing that point. But the question is, besides doing all sorts of harm to my program, can I inadvertently reach beyond the memory of my program? And I mean on OSX. – ChrisD Mar 27 '13 at 22:47
@ChrisD: OS X is a modern operating system, so it will provide you full memory protection. E.g. you should not be limited to what your program is allowed to do. This should not include messing with other processes (unless you're running under root privileges). – che Mar 28 '13 at 10:53
I'd rather say under ring 0 privileges, not root ones. – Ruslan Jul 10 '15 at 09:20
More interesting is that hyper-modern compilers may decide that if code tries to read `foo[0]` through `foo[len-1]` after having previously used a check of `len` against the array length to either execute or skip a piece of code, the compiler should feel free to run that other code unconditionally even if the application owns the storage past the array and the effects of reading it would have been benign, but the effect of invoking the other code would not be. – supercat Jun 23 '16 at 16:38

Udo Klein · Answer 4 · 2021-08-06T10:39:51.147

You write:

I read a lot of 'anything can happen', 'segmentation might be the least bad problem', 'your harddisk might turn pink and unicorns might be singing under your window', which is all nice, but what is really the danger?

Lets put it that way: load a gun. Point it outside the window without any particular aim and fire. What is the danger?

The issue is that you do not know. If your code overwrites something that crashes your program you are fine because it will stop it into a defined state. However if it does not crash then the issues start to arise. Which resources are under control of your program and what might it do to them? I know at least one major issue that was caused by such an overflow. The issue was in a seemingly meaningless statistics function that messed up some unrelated conversion table for a production database. The result was some very expensive cleanup afterwards. Actually it would have been much cheaper and easier to handle if this issue would have formatted the hard disks ... with other words: pink unicorns might be your least problem.

The idea that your operating system will protect you is optimistic. If possible try to avoid writing out of bounds.

ok, this was exactly what I was afraid of. I will 'try to avoid writing out of bounds' but, seeing what I have been doing the last few months, I will surely be doing it a lot still. How did you guys get so good in programming without a safe way to practice? — ChrisD, Mar 26 '13 at 21:05

mikyra · Answer 5 · 2013-03-26T20:56:51.067

8

Not running your program as root or any other privileged user won't harm any of your system, so generally this might be a good idea.

By writing data to some random memory location you won't directly "damage" any other program running on your computer as each process runs in it's own memory space.

If you try to access any memory not allocated to your process the operating system will stop your program from executing with a segmentation fault.

So directly (without running as root and directly accessing files like /dev/mem) there is no danger that your program will interfere with any other program running on your operating system.

Nevertheless - and probably this is what you have heard about in terms of danger - by blindly writing random data to random memory locations by accident you sure can damage anything you are able to damage.

For example your program might want to delete a specific file given by a file name stored somewhere in your program. If by accident you just overwrite the location where the file name is stored you might delete a very different file instead.

edited Mar 26 '13 at 20:56

answered Mar 26 '13 at 20:50

mikyra

10,077
1
40
41

1

If you *are* running as root (or some other privileged user), though, watch out. Buffer and array overruns are a common malware exploit. – John Bode Mar 26 '13 at 20:52
actually the account I use for all my daily computing is not an administrator account (I use the OSX terminology since that is my system). Do you mean to tell me I cannot possibly damage something by trying to set ANY memory location? That is actually great news! – ChrisD Mar 26 '13 at 21:08
As already mentioned before the worst harm you can do by accident is the worst harm you can do as a user. If you want to be 100% sure not do destroy any of your data probably you might want to add different account to your computer and experiment with that. – mikyra Mar 26 '13 at 21:14
ah that about making a separate user, I understand. But the question remains: how can I do better than that, and even then make sure I do not damage stuff beside my program that I am debugging? How did you guys learn to program, where was your safety net of not killing your computer in the process? ;) – ChrisD Mar 26 '13 at 21:20
To restate again - running with your user account there is no risk at all that you will damage any part of the system you are running as you don't have enough privileges to do any serious harm. – mikyra Mar 26 '13 at 21:45
1

@mikyra: That's true only if the system's protective mechanisms are 100% effective. The existence of malware suggests that you can't always rely on that. (I don't want to suggest that that's necessarily worth worrying about; it's possible, but unlikely, that a program could *accidentally* exploit the same security holes exploited by malware.) – Keith Thompson Mar 26 '13 at 22:25
@KeithThompson : Yes, that's true. But especially regarding malware my impression is that in 99 of 100 cases it's rather the user voluntarily "inviting" it by violating even most basic security precautions and only a very low percentage where protection mechanisms really fail. – mikyra Mar 26 '13 at 22:37
1

The list here includes: Running code from untrusted sources. Just clicking the OK button on any popup of the firewall without even reading what it it's about or shutting it down completely if the desired network connection can't be made. Patching binaries with the newest hack from dubious sources. It's not the fault of the vault if the owner will voluntarily invite any burglar with both arms and extra strong fortified doors wide open. – mikyra Mar 26 '13 at 22:52

score 4 · Answer 6 · answered Mar 26 '13 at 20:54

NSArrays in Objective-C are assigned a specific block of memory. Exceeding the bounds of the array means that you would be accessing memory that is not assigned to the array. This means:

This memory can have any value. There's no way of knowing if the data is valid based on your data type.
This memory may contain sensitive information such as private keys or other user credentials.
The memory address may be invalid or protected.
The memory can have a changing value because it's being accessed by another program or thread.
Other things use memory address space, such as memory-mapped ports.
Writing data to unknown memory address can crash your program, overwrite OS memory space, and generally cause the sun to implode.

From the aspect of your program you always want to know when your code is exceeding the bounds of an array. This can lead to unknown values being returned, causing your application to crash or provide invalid data.

`NSArrays` have out of bounds exceptions. And this questions seems to be about C array. — DrummerB, Mar 26 '13 at 20:59
I indeed meant C arrays. I know there is NSArray, but for now most of my exercises are in C — ChrisD, Mar 26 '13 at 21:04

score 4 · Answer 7 · edited Jun 20 '20 at 09:12

You may want to try using the memcheck tool in Valgrind when you test your code -- it won't catch individual array bounds violations within a stack frame, but it should catch many other sorts of memory problem, including ones that would cause subtle, wider problems outside the scope of a single function.

From the manual:

Memcheck is a memory error detector. It can detect the following problems that are common in C and C++ programs.

Accessing memory you shouldn't, e.g. overrunning and underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.

Using undefined values, i.e. values that have not been initialised, or that have been derived from other undefined values.

Incorrect freeing of heap memory, such as double-freeing heap blocks, or mismatched use of malloc/new/new[] versus free/delete/delete[]

Overlapping src and dst pointers in memcpy and related functions.

Memory leaks.

ETA: Though, as Kaz's answer says, it's not a panacea, and doesn't always give the most helpful output, especially when you're using exciting access patterns.

I would suspect the Analyzer of XCode would find most of that? and my question is not so much how to find these bugs, but if executing a program that still has these bugs is dangerous to memory not allocated to my program. I will have to execute the program in order to see the bugs happening — ChrisD, Mar 27 '13 at 10:24

score 3 · Answer 8 · answered Apr 04 '13 at 01:53

If you ever do systems level programming or embedded systems programming, very bad things can happen if you write to random memory locations. Older systems and many micro-controllers use memory mapped IO, so writing to a memory location that maps to a peripheral register can wreak havoc, especially if it is done asynchronously.

An example is programming flash memory. Programming mode on the memory chips is enabled by writing a specific sequence of values to specific locations inside the address range of the chip. If another process were to write to any other location in the chip while that was going on, it would cause the programming cycle to fail.

In some cases the hardware will wrap addresses around (most significant bits/bytes of address are ignored) so writing to an address beyond the end of the physical address space will actually result in data being written right in the middle of things.

And finally, older CPUs like the MC68000 can locked up to the point that only a hardware reset can get them going again. Haven't worked on them for a couple of decades but I believe it's when it encountered a bus error (non-existent memory) while trying to handle an exception, it would simply halt until the hardware reset was asserted.

My biggest recommendation is a blatant plug for a product, but I have no personal interest in it and I am not affiliated with them in any way - but based on a couple of decades of C programming and embedded systems where reliability was critical, Gimpel's PC Lint will not only detect those sort of errors, it will make a better C/C++ programmer out of you by constantly harping on you about bad habits.

I'd also recommend reading the MISRA C coding standard, if you can snag a copy from someone. I haven't seen any recent ones but in ye olde days they gave a good explanation of why you should/shouldn't do the things they cover.

Dunno about you, but about the 2nd or 3rd time I get a coredump or hangup from any application, my opinion of whatever company produced it goes down by half. The 4th or 5th time and whatever the package is becomes shelfware and I drive a wooden stake through the center of the package/disc it came in just to make sure it never comes back to haunt me.

Depending upon the system, out-of-range *reads* may also trigger unpredictable behavior, or they may be benign, though benign hardware behavior on out-of-range loads does not imply benign compiler behavior. — supercat, Jun 23 '16 at 16:28

score 2 · Answer 9 · answered Mar 26 '13 at 23:02

I'm working with a compiler for a DSP chip which deliberately generates code that accesses one past the end of an array out of C code which does not!

This is because the loops are structured so that the end of an iteration prefetches some data for the next iteration. So the datum prefetched at the end of the last iteration is never actually used.

Writing C code like that invokes undefined behavior, but that is only a formality from a standards document which concerns itself with maximal portability.

More often that not, a program which accesses out of bounds is not cleverly optimized. It is simply buggy. The code fetches some garbage value and, unlike the optimized loops of the aforementioned compiler, the code then uses the value in subsequent computations, thereby corrupting theim.

It is worth catching bugs like that, and so it is worth making the behavior undefined for even just that reason alone: so that the run-time can produce a diagnostic message like "array overrun in line 42 of main.c".

On systems with virtual memory, an array could happen to be allocated such that the address which follows is in an unmapped area of virtual memory. The access will then bomb the program.

As an aside, note that in C we are permitted to create a pointer which is one past the end of an array. And this pointer has to compare greater than any pointer to the interior of an array. This means that a C implementation cannot place an array right at the end of memory, where the one plus address would wrap around and look smaller than other addresses in the array.

Nevertheless, access to uninitialized or out of bounds values are sometimes a valid optimization technique, even if not maximally portable. This is for instance why the Valgrind tool does not report accesses to uninitialized data when those accesses happen, but only when the value is later used in some way that could affect the outcome of the program. You get a diagnostic like "conditional branch in xxx:nnn depends on uninitialized value" and it can be sometimes hard to track down where it originates. If all such accesses were trapped immediately, there would be a lot of false positives arising from compiler optimized code as well as correctly hand-optimized code.

Speaking of which, I was working with some codec from a vendor which was giving off these errors when ported to Linux and run under Valgrind. But the vendor convinced me that only several bits of the value being used actually came from uninitialized memory, and those bits were carefully avoided by the logic.. Only the good bits of the value were being used and Valgrind doesn't have the ability to track down to the individual bit. The uninitialized material came from reading a word past the end of a bit stream of encoded data, but the code knows how many bits are in the stream and will not use more bits than there actually are. Since the access beyond the end of the bit stream array does not cause any harm on the DSP architecture (there is no virtual memory after the array, no memory-mapped ports, and the address does not wrap) it is a valid optimization technique.

"Undefined behavior" does not really mean much, because according to ISO C, simply including a header which is not defined in the C standard, or calling a function which is not defined in the program itself or the C standard, are examples of undefined behavior. Undefined behavior doesn't mean "not defined by anyone on the planet" just "not defined by the ISO C standard". But of course, sometimes undefined behavior really is absolutely not defined by anyone.

Additionally, provided that there exists at least one program which a particular implementation processes correctly even though it nominally taxes all the implementation limits given in the Standard, that implementation could behave arbitrarily when fed any other program which is free of constraint violations and still be "compliant". Consequently, 99.999% of C programs (anything other than a platform's "one program") rely upon behaviors where the Standard imposes no requirements. — supercat, Jul 11 '16 at 17:12
Not a vendor that should be trusted, it seems. Generally `memcheck` _is_ bit accurate. That includes bitfields and binary logical operators. Not all integer arithmentic operatiors are tracked with bit accuracry, you need the `--expensive-definedness-checks=yes` option for that. Floating point isn't tracked for definedness. — Paul Floyd, Mar 02 '23 at 09:14

jbgs · Answer 10 · 2013-03-26T21:01:23.883

1

Besides your own program, I don't think you will break anything, in the worst case you will try to read or write from a memory address that corresponds to a page that the kernel didn't assign to your proceses, generating the proper exception and being killed (I mean, your process).

edited Mar 26 '13 at 21:01

answered Mar 26 '13 at 20:49

jbgs

2,795
2
21
28

3

..What? How about overwriting memory in your own process used to store some variable used later... which has now mysteriously changed its value! Those bugs are loads of fun to track down, I assure you. A segfault would be the *best* outcome. -1 – Ed S. Mar 26 '13 at 20:59
2

I mean he won't "break" other processes, besides his own program ;) – jbgs Mar 26 '13 at 21:00
I indeed do not care if I break my own program. I am just learning, the program is obviously wrong anyway if I access anything out of bound of my array. I am just getting more and more worried of the risks of breaking something else while debugging my creations – ChrisD Mar 26 '13 at 21:09
The thing is: can I be certain if I try to access memory not assigned to me, that my process will be killed? (being on OSX) – ChrisD Mar 26 '13 at 21:21
3

Years ago, I used to be a clumsy C programmer. I accessed arrays outside their bounds hundreds of times. Besides my process being killed by the operating system, nothing ever happened. – jbgs Mar 26 '13 at 21:22
@ChrisD: You're running in protected mode. Your app is sandboxed. You're not going to bring the system down. – Ed S. Mar 26 '13 at 21:24
@ChrisD If you could alter things in the OS that your user isn't supposed to be able to alter, you would have found a security hole in the OS. – potrzebie Mar 26 '13 at 21:26
@EdS. If I understand you correctly, you say that my program is actually guaranteed to be killed when it tries to access RAM not assigned to itself? (and I understand that 'guaranteed' means 'not taking into account bugs in the operating system') – ChrisD Mar 26 '13 at 21:31
@ChrisD: No, I never said that. You are invoking undefined behavior, i.e., the behavior of *your program* is now undefined. However, this is not a new problem, and most operating systems and CPU's protect you from yourself. There are of course exceptions; real mode applications, many embedded environments, kernel mode applications (i.e., drivers and the OS itself). These processes can bork everything else, but you're not in that playground. – Ed S. Mar 26 '13 at 21:32
@EdS.thanks for the clarification! So you mean my program could do the strangest stuff to itself, have really weird or normal results, or maybe get killed, but other processes will not be influenced (hoping I got it right this time :) ) – ChrisD Mar 26 '13 at 21:34

score 0 · Answer 11 · answered May 24 '21 at 16:01

Arrays with two or more dimensions pose a consideration beyond those mentioned in other answers. Consider the following functions:

char arr1[2][8];
char arr2[4];
int test1(int n)
{
  arr1[1][0] = 1;
  for (int i=0; i<n; i++) arr1[0][i] = arr2[i];      
  return arr1[1][0];
}
int test2(int ofs, int n)
{
  arr1[1][0] = 1;
  for (int i=0; i<n; i++) *(arr1[0]+i) = arr2[i];      
  return arr1[1][0];
}

The way gcc will processes the first function will not allow for the possibility that an attempt to write arr[0][i] might affect the value of arr[1][0], and the generated code is incapable of returning anything other than a hardcoded value of 1. Although the Standard defines the meaning of array[index] as precisely equivalent to (*((array)+(index))), gcc seems to interpret the notion of array bounds and pointer decay differently in cases which involve using [] operator on values of array type, versus those which use explicit pointer arithmetic.

AnArrayOfFunctions · Answer 12 · 2022-12-23T00:52:36.073

I just want to add some practical examples to this questions - Imagine the following code:

#include <stdio.h>

int main(void) {
    int n[5];
    n[5] = 1;

    printf("answer %d\n", n[5]);

    return (0);
}

Which has Undefined Behaviour. If you enable for example clang optimisations (-Ofast) it would result in something like:

answer 748418584

(Which if you compile without will probably output the correct result of answer 1)

This is because in the first case the assignment to 1 is never actually assembled in the final code (you can look in the godbolt asm code as well).

(However it must be noted that by that logic main should not even call printf so best advice is not to depend on the optimiser to solve your UB - but rather have the knowledge that sometimes it may work this way)

The takeaway here is that modern C optimising compilers will assume undefined behaviour (UB) to never occur (which means the above code would be similar to something like (but not the same):

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int n[5];

    if (0)
        n[5] = 1;

    printf("answer %d\n", (exit(-1), n[5]));

    return (0);
}

Which on contrary is perfectly defined).

That's because the first conditional statement never reaches it's true state (0 is always false).

And on the second argument for printf we have a sequence point after which we call exit and the program terminates before invoking the UB in the second comma operator (so it's well defined).

So the second takeaway is that UB is not UB as long as it's never actually evaluated.

Additionally I don't see mentioned here there is fairly modern Undefined Behaviour sanitiser (at least on clang) which (with the option -fsanitize=undefined) will give the following output on the first example (but not the second):

/app/example.c:5:5: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:5:5 in 
/app/example.c:7:27: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:7:27 in

Here is all the samples in godbolt:

https://godbolt.org/z/eY9ja4fdh (first example and no flags)

https://godbolt.org/z/cGcY7Ta9M (first example and -Ofast clang)

https://godbolt.org/z/cGcY7Ta9M (second example and UB sanitiser on)

https://godbolt.org/z/vE531EKo4 (first example and UB sanitiser on)

How dangerous is it to access an array out of bounds?

12 Answers12

Linked

Related