0

Even though it is bad practice, is there any way the following code could cause trouble in real life? Note than I am only reading out of bounds, not writing:

#include <iostream>

int main() {
  int arr[] = {1, 2, 3};
  std::cout << arr[3] << '\n';
}
Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
Kleysley
  • 505
  • 3
  • 12
  • No, it is not safe. – Evg Jul 31 '21 at 06:16
  • Yeah thanks but why? – Kleysley Jul 31 '21 at 06:17
  • Does this answer your question? [Array index out of bound behavior](https://stackoverflow.com/questions/671703/array-index-out-of-bound-behavior) – phuclv Jul 31 '21 at 06:17
  • Not really because in my example I am only reading what's in memory after, not setting anything so what would be the risk in reading another programs memory? – Kleysley Jul 31 '21 at 06:19
  • 2
    you can't read other programs' memory, each process has a separate address space, and reading beyond what is allowed is undefined behavior. No one knows if that address is valid or not. Duplicates: [Access array beyond the limit in C and C++](https://stackoverflow.com/q/18727022/995714), [How dangerous is it to access an array out of bounds?](https://stackoverflow.com/q/15646973/995714) – phuclv Jul 31 '21 at 06:21
  • Maybe it's not clear what your question means by "safe". – aschepler Jul 31 '21 at 06:23
  • How could it be safe to interpret some garbage memory bytes as `std::string`, which has some non-trivial internal structure (length, pointer, SSO, etc.)? This question would have made some sense for primitive types like `char`, but not for `std::string`. – Evg Jul 31 '21 at 06:25
  • By "safe" I mean that if my program finishes everything is the same (no software (and hardware?) is damaged and the other programs run like I never ran my code) – Kleysley Jul 31 '21 at 06:25
  • @Evg Why can you be sure that it's garbage and why would it be different with a primitive? – Kleysley Jul 31 '21 at 06:27
  • You can't damage your PC or other software on the PC by simply reading out of bounds of your own app's memory, no. But you can potentially corrupt your own app's memory, for instance if you happen to read into a guard page then [bad things can happen](https://devblogs.microsoft.com/oldnewthing/20060927-07/?p=29563). – Remy Lebeau Jul 31 '21 at 06:29
  • @phuclv _each process has a separate address space_ that's true in all the modern operative systems (which is probably what OP is using). it's worth noting that It can be different in some scenario, e.g. devices that don't even have an OS or the concept of "process" – Gian Paolo Jul 31 '21 at 06:29
  • @RemyLebeau If I corrupt the memory of my own program, will it be corrupted after stopping and restarting the C++ programM? – Kleysley Jul 31 '21 at 06:31
  • `std::string` allocates long strings dynamically and keeps a pointer to that buffer. If you interpret random bytes as `std::string`, it could happen that you deference an invalid pointer when you try to output that string. You have no guarantees about what's inside `a[3]`. Trying to read that memory could result in reading garbage bytes, segmentation fault, whatever... The behaviour is undefined. The only thing one can be sure about is that `a[3]` is not a valid `std::string` object. – Evg Jul 31 '21 at 06:37
  • Nobody is giving a straight answer... No, doing this doesn't cause any lasting problems on modern OSes. It affects your program only, until it's closed. – HolyBlackCat Jul 31 '21 at 06:49
  • @Kleysley "*If I corrupt the memory of my own program, will it be corrupted after stopping and restarting the C++ programM?*" - no. Everything is reset with new memory each time the program is started. – Remy Lebeau Jul 31 '21 at 06:52
  • @HolyBlackCat Thank you very much, if you write this as an answer I will mark it as the accepted one. – Kleysley Jul 31 '21 at 06:52
  • If you care about "what can happen", then study the generated machine code. Live demo: https://godbolt.org/z/Msb8MTYhx. You can see that at the beginning of `main`, 24 bytes are reserved on the stack, and 4 of these bytes are read and sent to `std::cout`. This implies that some "random" number is printed. Of course, with other implementation or other configuration, different machine code may be produced. – Daniel Langr Jul 31 '21 at 12:24

3 Answers3

2

As mentioned, it is not "safe" to read beyond the end of the stack. But it sounds like you're really trying to ask what could go wrong? and, typically, the answer is "not much". Your program would ideally crash with a segfault, but it might just keep on happily running, unaware that it's entered undefined behavior. The results of such a program would be garbage, of course, but nothing's going to catch on fire (probably...).

People mistakenly write code with undefined behavior all the time, and a lot of effort has been spent trying to help them catch such issues and minimize their harm. Programs run in user space cannot affect other programs on the same machine thanks to isolated address spaces and other features, and software like sanitizers can help detect UB and other issues during development. Typically you can just fix the issue and move on to more important things.

That said, UB is, as the name suggests, undefined. Which means your computer is allowed to do whatever it wants once you ask it to execute UB. It could format your hard drive, fry your processor, or even "make demons fly out of your nose". A reasonable computer wouldn't do those things, but it could.

The most significant issue with a program that enters UB is simply that it's not going to do what you wanted it to do. If you are trying to delete /foo but you read off the end of the stack you might end up passing /bar to your delete function instead. And if you access memory that an attacker also has access to you could wind up executing code on their behalf. A large number of major security vulnerabilities boil down to some line of code that triggers UB in just the wrong way that a malicious user can take advantage of.

dimo414
  • 47,227
  • 18
  • 148
  • 244
  • I would add that undefined behavior exists only from the perspective of the C++ standard. Once a C++ compiler translates source code into machine code, the behavior of the resulting program is very well defined by that machine code. – Daniel Langr Jul 31 '21 at 12:19
1

Depends on what you mean by stack. If it is the whole stack, then no, you can't do that, it will lead to a segmentation fault. Not because there is the memory of other processes there (that's not how it works), but rather because there is NOTHING there. You can heuristically see this by looking at the various addresses the program uses. The stack for example is at ~0x7f7d4af48040, which is beyond what any computer would have as memory. The memory your program sees is different from the physical memory.

If you mean read beyond the stack frame of the current method: yes, you can technically do that safely. Here is an example

void stacktrace(){
        std::cerr << "Received SIGSEGV. Stack trace:\n";
        void** bp;
        asm(R"(
                .intel_syntax noprefix
                mov %[bp], rbp
                .att_syntax
        )"
                : [bp] "=r" (bp));
        size_t i = 0;
        while(true){
                std::cerr << "[" << i++ << "] " << bp[1] << '\n';
                if(bp > *bp) break;
                bp = (void**) *bp;
        }
        exit(1);
}

This is a very basic program I wrote to see, whether I could manually generate a stack trace. It might not be obvious if you are unfamiliar, but on x64 the address contained in rbp is the base of the current stack frame. In c++, the stack frame would look like:

return pointer
previous value of rsp [rsp = stack pointer] <- rbp points here
local variables (may be some other stuff like stack cookie)
...
local variables <- rsp points here

The address decreases the lower you go. In the example I gave above you can see that I get the value of rbp, which points outside the current stack frame, and move from there. So you can read from memory beyond the stack frame, but you generally shouldn't, and even so, why would you want to?

Note: Evg pointed this out. If you read some object, beyond the stack that might/will probably trigger a segfault, depending on object type, so this should only be done if you are very sure of what you're doing.

Lala5th
  • 1,137
  • 7
  • 18
  • Thanks, but I was wondering if my code would be able to corrupt/brake anything else than itself. Can I run it 50 times without any consequences? – Kleysley Jul 31 '21 at 06:39
  • 1
    @Kleysley The kernel generally keeps stuff away from doing dangerous stuff (receiving a segfault is basically the kernel saying that you shouldn't do that and shutting you down before you can do it) – Lala5th Jul 31 '21 at 06:43
0

If you don't own the memory or you do own it but you haven't initialized it, you are not allowed to read it. This might seem like a pedantic and uselss rule. Afterall, the memory is there and I am not trying to overwrite anything, right? What is a byte among friends, let me read it.

The point is that C++ is a high level language. The compiler only tries to interpret what you have coded and translate it to assembly. If you type in nonsense, you will get out nonsense. It's a bit like forcing someone translate "askjds" from English to German.

But does this ever cause problems in real life? I roughly know what asm instructions are going to be generated. Why bother?

This video talks about a bug with Facebooks' string implementation where they read a byte of uninitialized memory which they did own, but it caused a very difficult to find bug nevertheless.

The point is that, silicon is not intuitive. Do not try to rely on your intuitions.

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93