24

From "Effective C++ 3rd edition by Scott Meyers":

To emphasize that the results of undefined behavior are not predictable and may be very unpleasant, experienced C++ programmers often say that programs with undefined behavior can erase your hard drive.

Under what circumstances can this happen?

For example, can accessing and writing to locations out of array range corrupt memory that doesn't belong to this C++ program or thread?

Oleksiy
  • 37,477
  • 22
  • 74
  • 122
  • 2
    @ShumailMohy-ud-Din yes, correct. By Scott Meyers – Oleksiy Aug 29 '13 at 08:40
  • 12
    Take an example. There is a function which erases a hard drive. Now, you have just assigned some integer to a function pointer, which unluckily happens to be the address of the function which erases the hard drive. If you call the function by dereferencing the function pointer, the HDD is erased. – phoxis Aug 29 '13 at 08:41
  • 9
    Quite a few times invoking UB I have erased vital parts of the systems flash memory I work with. – PlasmaHH Aug 29 '13 at 08:42
  • 30
    I nearly gave myself laser eye surgery with a barcode scanner when it invoked UB, and I know a man whose monitor caught fire. But I've never seen hard drive erased. It could happen though. – doctorlove Aug 29 '13 at 08:42
  • You can corrupt the heap headers or other structures, or overwrite stack frames of a specific process, if you write around in different locations which you have not allocated. A threads has different stack, but runs in the same virtual memory space, therefore you can overwrite something which belongs to another thread. Although you cannot access the address space of another process like that. – phoxis Aug 29 '13 at 08:44
  • 8
    Undefined behavior can make demons fly out of your nose.... - http://www.catb.org/jargon/html/N/nasal-demons.html – No Idea For Name Aug 29 '13 at 08:45
  • 1
    I'm foreseeing a large number of online C++ compilers having their hard drives erased. Also clever programmers who successfully made a *tested* example for this question. – Mark Garcia Aug 29 '13 at 08:46
  • Can it? Of course it can! Will it? Most probably not! – Christian Rau Aug 29 '13 at 10:58
  • 1
    While I don't want to propagate relying on UB in any way, sometimes people just forget to take such statements for what they actually are, educational *jokes*. In practice undefined behaviour is still subject to our physical and technological reality. Formatting a hard-drive out of mere fun over a UB program isn't much more likely than the existence of demons (yet the case is different if the program is messing with your hard-drive in the first place, as in *MSalters*'s case, but that is an entirely different question). – Christian Rau Aug 29 '13 at 11:11
  • @NoIdeaForName: maybe your nose. Not mine. – thb Sep 06 '17 at 01:27

8 Answers8

28

Can it? Sure. Happened to me, in fact.

I wrote code to delete a temporary directory. That involved creating a recursive delete <temp directory>\*.* command. Due to a bug, the <temp directory> field wasn't always filled in. Our file system code happily executed the recursive delete \*.* command.

My colleagues noticed when the icons on their desktop suddenly disappeared. Took out two machines.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 4
    Sounds like that's actually a case of well-defined (albeit undesirable) behaviour, because you know exactly what it's going to do if you don't fill in that field. – Peter Bloomfield Aug 29 '13 at 09:08
  • 5
    Oh, explicitly creating the command would have involved only well-defined behavior. The actual bug was UB, and the question was how UB could lead to the deletion of a harddisk. Well, like this: a variable gets corrupted, in this case a path variable, and other bug-free code now starts behaving in unexpected ways. That's a general problem with UB: it breaks strong assumptions about other code. – MSalters Aug 29 '13 at 09:13
  • 1
    But this isn't a result of undefined behavior, this is just a result of using uninitialized Memory. I would bet the executed delete, delted parts of that was mentioned in the memory at this size. So the programm had UB ofc, but the result wasn't UB made! – dhein Aug 29 '13 at 10:11
  • 9
    @Zaibis: Using uninitialized memory **is** Undefined Behavior. – MSalters Aug 29 '13 at 10:15
  • 3
    No, its undefined what is written in uninitialized memory, but undefined behaviour means the implementation is allowed to do anything, it doesn't even has to care about your code anymore. But to delete something randomized as your command was delete on something unkown. It's is of course not determinable what will happen, but as this causes UB it doesn't has to be equal to the result of UB. the result of UB would be for example as the executeable starts downloading porns isntead of deleting something. but your example is probably(we cant know because it MAY be UB) jsut result of unknown invoke – dhein Aug 29 '13 at 10:41
  • 4
    @Zaibis: You're mistaken. You might be thinking of _unspecified_ behavior, which is fairly harmless. But reading uninitialized memory absolutely is not _unspecified_ behavior but outright Undefined Behavior. The compiler may leave out the entire read. E.g. `int i; float f = 0.5; f = i;` may leave `f` set to 0.5, precisely because `i` is uninitialized. Optimizers do remove such assignments, which would not be legal if the value of `i` was merely unspecified. – MSalters Aug 29 '13 at 11:06
  • @Zaibis Yes, but the undefined part (and the scary bit) is that *sometimes it wouldn't*. I mean, if you accidentally freed the memory before-hand, half the time the variable would still have the correct path in. Alternatively, if the string was normally junk or in debug mode set all to DEADBEEF during development, then it would be unable to open the directory, but not do anything bad because there'd be no valid target. So while it'd be a bug (files still exist), it wouldn't look like a "Nightmare Mark 1" bug. Then the system uses a null-initialised space. BOOM. – deworde Aug 29 '13 at 11:22
  • @deworde The simple point I'm mentioning is: The standard says if you dont respect the rules, the compiler is allowed to do anything. But in your case its not just an compiling made UB. It's an UB runtime amde, because reading random isntructions and execute them. And thats not whats an UB is refering to the standard! – dhein Aug 29 '13 at 12:39
11

If you take into account that UB is available not only to user-mode code, but also the system programmer. In other words, if you are writing driver-code with UB (or other bugs!) in it, you could end up writing to to a piece of memory that is later written back as the "root of the whole disk-data structure".

I have indeed had a bug in a driver which I worked on, that caused disk corruption, because the driver was using stale pointers (pointer use after free). If you were UNLUCKY, the unused memory happened to be a block owned by the filesystem, so it would write back to the disk some random garbage. Fortunately, it wasn't too difficult to determine what the problem was, and I only needed reformat the disk once on my test-system (when working on drivers, you typically use two computers, one to build the code, and one to test the code on - the test machine typically has a minimal install set, and often gets reformatted and reconfigured relatively often anyway).

I don't think Scott's mention necessarily means this type of situation, but it's entirely possible that if you have wild-enough code, it can cause almost anything to happen. Including finding holes in the security system (cf. all the stack-smashing exploits that have been successful). You probably have to be VERY unlucky to achieve that, but people also win those mega-lotteries from time to time, so if you can achieve something that has one chance in several million once a week or once a month, then a computer that can perform operations many millions of times a second can achieve much less likely things...

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
7

From the C++11 Standard (actually from Draft N3337), in section 1.3 Terms and definitions [intro.defs] (emphasis mine):

undefined behavior
behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. — end note ]

From "no requirements" + "unpredictable results" we can conclude that (in theory) anything could happen.

Now, no "reasonable" compiler would purposely emit code to erase the hard drive for, e.g., a division by 0, but it could happen if you mess with the file system or indeed, as you said, if you corrupt memory (edit: see MSalters' comment on their own answer).

The point here is: always be careful to never ever ever invoke undefined behavior. "Here Be Dragons."

(In practice it can be hard to be sure that your program is well-defined. There's some advice. Know you language well, and keep away from the dusty corners. If a piece of code seems suspect or too complex, try to rewrite it to make it simpler and clearer. Always compile with the highest level of warnings, and don't ignore them. There are also compiler flags like -fcatch-undefined-behavior and tools like lint that can help. And testing of course, but that's a bit late.)

Community
  • 1
  • 1
gx_
  • 4,690
  • 24
  • 31
  • Something like division by zero is not so harmless as it may appear. Many processors have a "divide by zero" trap, and some operating systems will allow user code to install a handler for it. If a routine installs a divide-by-zero handler which modifies one of that routine's automatic variables and returns, and that routine exits with the handler still installed, then a divide-by-zero which occurs in another completely different routine could trash some arbitrary value on the stack before continuing execution, leading to unbounded nastiness. – supercat Sep 23 '13 at 18:08
6

A memory violation can theoretically result in your program executing the wrong code. If you're very unlucky it could be code which deletes stuff on your hard drive. I suspect it's unlikely to go that far though, unless you're handling low-level disk operations yourself.

I think the point of the statement is that you need to take undefined behaviour extremely seriously, and do everything practical to guard against it (i.e. defensive programming). I've seen too many bad programmers naively relying on some undefined behaviour, assuming it will work the same all the time. In practice, it's not predictable, and sometimes the result can be catastrophic.

Peter Bloomfield
  • 5,578
  • 26
  • 37
2

A simple example would be that you happen to corrupt the block number you're writing to, or the file name you're about to delete.

2

Yes.

Consider an application that processes external input (for example a component of a web application) and that has a buffer overflow, which is a fairly common type of undefined behavior.

An attacker notices this and deliberately crafts input that erases all the data. (Most attackers actually don't do that: what they want to do is to retrieve your data, or to plant content on your site. But occasionally some do want to erase your files.)

The maximal extent of the damage depends on what security layers the attacker is able to bypass. If the server hasn't been configured securely, or if there are other vulnerabilities that the attacker is able to exploit, then the attacker may be able to gain administrator privileges on the machine or use it as a relay to attack other machines. All this from a single buffer overflow.

The lesson to keep from this is that undefined behavior is not just about things that are likely to happen. Not only can things happen that you wouldn't expect (some compilers are very good at picking up weird optimizations that are only correct when a variable isn't modified twice between sequence points and do something very surprising otherwise), but things can happen that are mathematically extremely unlikely because someone deliberately went out of their way to make them happen.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
1

In Linux, any operation is valid when you are a root user. Even destroying your root filesystem. rm -rf /

Every code segment(with bugs) are happily executed when you are the root. All UB are assumed to be intended with sudo permissions.

sr01853
  • 6,043
  • 1
  • 19
  • 39
  • 1
    I'm not sure this is the greatest way to explain this. It's not as if undefined behavior is intentionally given root, it's just...undefined. In theory, non-root undefined behaviors shouldn't be able to do anything that regular non-root code can't, but if you're already unlucky...well, see @Mat's story above. – Matt Krause Aug 29 '13 at 15:22
0

Er, no offense, but several of the other answers are wrong or at any rate misleading, in my experience.

The C++ standard does not constrain undefined behavior. The operating system does however normally constrain it. Reason: the behavior is undefined with respect to C++.

... always be careful to never ever ever invoke undefined behavior.

Nonsense. Experience belies this suggestion. C++ programmers often inadvertently invoke undefined behavior during testing. Sometimes I do it on purpose, just to see what happens.

Now, I realize that someone thinks that I am flaunting foolhardiness here, but really, your laptop is hardly more likely to catch on fire with undefined than defined behavior. Undefined behavior in C++ emits assembly code with defined behavior. Think about that. The assembly behavior remains defined. It's just that the C++ standard no longer understands the mechanics.

There are times you want to provoke undefined behavior just to see what's going on on the stack.

If you are in an environment in which it is possible to write a defined C++ program that catches your laptop on fire, then you will have to be careful in any case; but the main problem in that case is a lack of hardware- and/or kernel-based protection.

In sum, don't let the C++ standard confuse you. It's just informing you what its own limits of competence are.

E_net4
  • 27,810
  • 13
  • 101
  • 139
thb
  • 13,796
  • 3
  • 40
  • 68
  • The philosophy you describe would IMHO be appropriate for someone writing a quality general-purpose compiler, but modern optimizers allow UB to toss laws of time and causality out the window. If a compiler can figure out that any state where x isn't 42 would yield UB, it may (and from what I can tell, according to some people's philosophy, should) generate code that behaves as though x is 42. If a program would do something dangerous when x is 42, and relies upon safety code that to prevent x from becoming 42 when it shouldn't, such optimization may completely bypass all safety measures. – supercat Sep 06 '17 at 16:48
  • I think you may be confusing the notions of "what any sane compiler would do" with "what gcc and clang actually do". See `https://godbolt.org/z/zY88xeh8a` for an example. If you were asked to inspect the source code and determine whether it could write beyond the end of `arr[]` if `x` exceeded 32769, would you see any way that could happen? I don't think such a thing could happen with any sane compiler *that's designed to be suitable for general-purpose use*, but the code generated by gcc at -O2 will unconditionally store 0 to `arr[x]`. – supercat May 17 '23 at 15:14
  • It shouldn't take much imagination to see how something like that could allow a malicious person to craft malformed data that would cause catastrophic malfunction and likely allow arbitrary code execution. Such optimizations may be reasonable in code that will never receive data from untrustworthy sources, but in most general-purpose code it would be reckless to assume that there will never be any malicious inputs. – supercat May 17 '23 at 15:28