4

I recently found that a fairly large image manipulation program I'm writing in C on a Windows 8 machine has a bug when used in very particular circumstances. Unfortunately, the bug is causing my entire computer to come to a standstill so that my only option is to pull the plug on the computer (especially annoying when I'm working remotely...)

Because it's an image manipulation program, I can't just flood it with print statements to isolate the problematic section - the problem occurs somewhere in a loop that's called billions of times, so adding a printf slows it down to the point that it would take days to get to a failing iteration.

I understand, therefore, if this question is too broad, as it isn't really reasonable for me to put down all of the code that could cause my problem, I'm simply asking

What are the circumstances in which C code can, instead of seg faulting or halting the program, actually freeze the entire OS

When I search the problem, I see code golf questions like this

A C program which crashes the system(shuts down the system)

This is not what I'm asking - obviously I haven't written system("shutdown") anywhere in my loop.

Being most familiar with python and java, this problem is not what I'm used to, but in my experience,

  • Dividing by zero produces a seg fault
  • Accessing memory by accident that is slightly outside an intended array causes a seg fault (sometimes down the road a little)
  • Accessing protected memory causes the program to hang
  • Stack overflow causes a seg fault
  • Dereferencing a non-initialized pointer causes a seg fault

Is this impression false - could those cases cause the whole system to crash? What cases am I missing? Is it dependent on my version of gcc, or my permission status?

I haven't been able to try to reproduce it on a different operating system yet, as it requires a few dependencies to run the entire program.

If my only option is to sit for days waiting for the program to run with print statements, or avoid weird situations, then, of course, so be it. I'm looking for key places to look for the bug.

Community
  • 1
  • 1
en_Knight
  • 5,301
  • 2
  • 26
  • 46
  • Try pressing `Ctrl+Shift+ESC` and wait a bit. Within a minute the Task Manager window should open, check if there is a high CPU utilization process running (your app?) – xmojmr Oct 21 '14 at 06:22
  • I've tried. I waited... and waited... and waited. With the time I waited I think I could have solved the halting problem and proved that the operating system was never coming back alive :) I could run it on a virtual computer I suppose, and then monitor it from my main computer, and then use that trick – en_Knight Oct 21 '14 at 06:26
  • One thing that can turn whole OS very unresponsive due to a loop that's called billion times is memory exhaustion and page swapping. Any user application (e.g. yours) can try to consume more and more memory causing the page swapping (the hard disk led activity indicator would blink). Check your `new`/`delete`. This is not what you had to do in Java at all – xmojmr Oct 21 '14 at 06:32
  • 2
    Instead of print statements write your debug out to a file. It might still be slow for billions of iterations but not as slow as writing to the screen. – Paul Rooney Oct 21 '14 at 07:00
  • 1
    If your C program causes the OS to crash (and you're running on a common desktop/server OS) then 1 of 4 things is going on: 1. Your hardware is defect/failing. 2. There's a bug in the OS kernel (which includes drivers running in kernel mode). 3. You're writing kernel code and crashing the kernel. 4. You're accessing low level privileged pieces of the OS and screwing up the system (e.g. shredding the hard drive the OS runs from or killing system critical processes). – nos Oct 21 '14 at 12:54
  • Do you observe any disk IO (ie HDD led is constantly lit or you can hear your HDD working) ? – GreenScape Oct 21 '14 at 13:13
  • The root of the problem: "I'm writing in C on a Windows 8 machine" You might try running in a VM and exec your application using a debugger. You did not post the relevant code, so I cannot evaluate it. However, I would suspect a small memory leak that (in billions of executions) results in a out-of-memory lockup. – user3629249 Oct 21 '14 at 15:04
  • @nos number 1 is always possible, though it's a pretty new computer (1 month). I'm not writing kernel code. Could there really be a bug in Windows 8 killing my program? Seems unlikely... Could I be accessing low level privileged pieces by mistake? – en_Knight Oct 21 '14 at 15:13
  • @GreenScape no, I don't have any hints like that, disk sounds fine. – en_Knight Oct 21 '14 at 15:14
  • @en_Knight in order to make your question actually useful to some future reader, would you please clarify what was the final root cause of your problem? (because the accepted answer says basically that under normal circumstances this is not possible) – xmojmr Oct 21 '14 at 17:45
  • 1
    @xmojmr yes I will - I'm testing the code on other computers today so hopefully that will help me get to the source – en_Knight Oct 21 '14 at 18:30

2 Answers2

2

On modern systems with hardware-enforced privilege separation between user-mode and kernel-mode, and an operating system that functions to correctly configure these mechanisms, you simply cannot crash the system from a user mode process.

Any of those errors are trapped by the CPU, which call exception handlers in the OS which will quickly pull the plug on your system.

If I had to guess, a piece of hardware is overheating or malfunctioning:

  • Overheating CPU due to poor thermal conductivity with heatsink
  • Failing / under-sized power supply
  • Failing DIMMs
  • Failing hard drive
  • Failing CPU
  • Failing / overheating GPU

I've seen cryptocoin-mining software bring a system to its knees because it was pushing the limits of the GPU. When the card would lock-up/reset, the driver would get confused or lock-up, and the system would end up needed rebooted.

Your system is doing next to nothing when you're just sitting there browsing the web, etc. But if your system locks up when you start running a CPU-intensive application, it can bring out problems that you didn't know where there.

While this is a little out-of-place on Stack Overflow, it falls into one of those grey areas between hardware and software. I would stress-test your system, keeping an eye on CPU/GPU/memory temperatures, and power supply voltages. Check out MemTest86, Stresslinux.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • It's a brand new computer, a few weeks old. I'm monitoring the cpu,gpu and the memory. The memory definitely gets up there, but it's no where near my RAM limits. I also am able to run all sorts of other intensive applications, even this one in other circumstances. I'll definitely check out those applications though, could of course be something I'm missing – en_Knight Oct 21 '14 at 06:08
  • Are you using any GPU acceleration? – Jonathon Reinhart Oct 21 '14 at 06:12
  • No, I'm not, in fact I took out all of my GPU code for debugging and it didn't help. It doesn't seem like it, from you answer, but could it matter where the data is stored - data section, heap, stack - vis-a-vis how an error in indexing would show itself? – en_Knight Oct 21 '14 at 06:14
  • No. You shouldn't be able to do anything from user-mode to cause your system to lock up (other than perhaps call buggy kernel / driver services). All of those places data are stored are "just memory" - there's certainly nothing there that could cause your system to lock up. – Jonathon Reinhart Oct 21 '14 at 06:15
  • Okay, I believe you - you're probably right that it's hardware related. Testing it on a different computer should confirm that. I'll accept this if blinding inspiration doesn't come forth in the form of a different answer soon. – en_Knight Oct 21 '14 at 06:19
  • Hardware problems can manifest themselves in the most strange and unpredictable ways. Best of luck to you! – Jonathon Reinhart Oct 21 '14 at 06:20
  • 'It's a brand new computer, a few weeks old' - I'm betting on 'crap driver'. – Martin James Oct 21 '14 at 17:15
0

The most trivial cause of OS freezing is "memory full". If you have processes that use a lot of memory, then your system is going to swap from main memory (typically RAM) to secondary memory (typically disk) which lead to a very huge overhead... As a user what you usually observe is a almost freezed computer, sometimes so freezed that you think it is crashed. If your OS is badly designed then it sometimes crashes!

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
  • That's what I expected was the problem, but I've been monitoring memory closely. It definitely is being eaten up, but when the program crashes, it typically is using much less memory than other times during execution when things are running smoothly – en_Knight Oct 21 '14 at 15:06