15

I like the LWN article "Crash-only software" and I would like to learn more about crash-safe and fault-tolerant programming.

It is surprisingly hard to assure that the persistent state is consistent in fault situations. Here I do not even talk about distributed operations: That is hard on a single node, too: Even the normal Berkeley DB (BDB Data Store or BDB Concurrent Data Store) might have a destroyed database if the system crashes. Not only that high level application constraints are broken, the database might not be opened correctly if the system crashes.

What are good resources about crash-safe and fault-tolerant designs, approaches, and programming.

If the resources focus on C++ and POSIX environments, I would appreciate that.

dmeister
  • 34,704
  • 19
  • 73
  • 95
  • 1
    Side note: in the latest Mac OSX (Snow Leopard) the OS just sends a SIGKILL to all applications which are in a supposedly 'clean' state. Impressive how this really results in a 1-second shutdown (on a newer machine, I must admit). Reference: http://developer.apple.com/mac/library/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_6.html#//apple_ref/doc/uid/TP40008898-SW22 – ChristopheD Mar 08 '10 at 22:18

4 Answers4

6

Akka is a framework for Java and Scala that is written with let-it-crash in mind. See this article and this presentation for an introduction to Actors and let-it-crash. It is also called Fail-Fast and worker/supervisor style.

Two good presentations on erlang is Systems that Never Stop (and Erlang) and Message Passing Concurrency in Erlang

Theron is a actor library for C++, I also think there is something in Boost also.

Also Erlang can call C or C++ code see this for a discussion. Java / Scala / Akka can also call C++ code.

(If you like C++ I suggest you to have a look at Scala, very nice language and better than Java if you come from C++.)

Also Jonas Boners presentation Scalability, Availability & Stability Patterns is a good presentation on the topic.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
oluies
  • 17,694
  • 14
  • 74
  • 117
  • 3
    If you let Java (or Scala) call a C++ DLL by using JNI, then the stability of the JVM is endangered. Since the C++ code runs in the same process as the JVM, the JVM will die if you get a crash in the C++ code. JNI does not work very well, do not use it. – olle kullberg Jul 27 '10 at 09:43
  • JNI/JNA works great if you understand the complexities involved. Crashing your JVM because the C code you called seg faulted isn't JNI's fault. – Matt Wonlaw Sep 30 '11 at 16:36
1

The Aktor model in languages Erlang and Scala the let it crash model. See this article.

TTMAN
  • 119
  • 1
  • 7
0

If you want to implement fault tolerance features in C++, basically you will rewrite Erlang. Don't reinvent the wheel, Erlang OTP is there and battle tested for 35+ years. Use it!.

Farshid Ashouri
  • 16,143
  • 7
  • 52
  • 66
0

To add to the above set of answers, there is Groovy and GPars, which has been missed out. Of course this is not C++.

There is another experimental library in C++ called libcppa. Theron is more mature than this.

Any way your best bet is to use either:

  1. Erlang
  2. Scala / Akka