Isolate crash prone (SEGV) but speed critical legacy code into a separate binary

Question

I have a code base (mostly C++) which is well tested and crash free. Mostly. A part of the code -- which is irreplaceable, hard to maintain or improve and links against a binary-only library* -- causes all crashes. These to not happen often, but when they do, the entire program crashes.

          +----------------------+
          | Shiny new sane       |
          |  code base           |
          |                      |
          |  +-----------------+ |   If the legacy code crashes,
          |  |                 | |   the entire program does, too.
          |  |   Legacy Code   | |
          |  | * Crash prone * | |
          |  | int abc(data)   | |
          |  +-----------------+ |
          |                      |
          +----------------------+

Is it possible to extract that part of the code into a separate program, start that from the main program, move the data between these programs (on Linux, OS X and, if possible, Windows), tolerate crashes in the child process and restart the child? Something like this:

        +----------------+         // start,
        | Shiny new sane | ------. // re-start on crash
        |  code base     |       | //   and
        |                |       v // input data
        |                |    +-----------------+
        |   return       |    |                 |
        |   results <-------- |   Legacy Code   |
        +----------------+    | * Crash prone * |
                              |  int abc(data)  |
       (or not results        +-----------------+
        because abc crashed)

Ideally the communication would be fast enough so that the synchronous call to int abc(char *data) can be replaced transparently with a wrapper (assuming the non-crash case). And because of slight memory leaks, the legacy program should be restarted every hour or so. Crashes are deterministic, so bad input data should not be sent twice.

The code base is C++11 and C, notable external libraries are Qt and boost. It runs on Linux, OSX and Windows.

--
*: some of the crashes/leaks stem from this library which has no source code available.

My suggestion would be to rewrite the legacy code - that will be a much better solution than trying to work around the fact that it crashes. — Mats Petersson, Dec 22 '15 at 11:08
Unfortunately a rewrite is **absolutely not** possible, too much very specialized knowledge is required for it and those who wrote it are no longer around. — toting, Dec 22 '15 at 11:10
You have the code. You can see what it does. How much specialty knowledge could possibly be needed? Whether or not it's extractable depends on the code - and if you don't understand it enough to rewrite it, you don't understand it enough to be certain anything you do with it is safe. — Andrew Henle, Dec 22 '15 at 11:22
You can use some socket , if the `Shiny` cannot connect, he launches the legacy. The legacy listen, wait for data and answer with the `int`. The client waits the answer, in case of connection lost, data triggers the bug so do not retry. — Ôrel, Dec 22 '15 at 11:48
Possible duplicate of [Catch Segfault or any other errors/exceptions/signals in C++ like catching exceptions in Java](http://stackoverflow.com/questions/6008470/catch-segfault-or-any-other-errors-exceptions-signals-in-c-like-catching-excep) — user3528438, Dec 22 '15 at 13:56
@user3528438 no, the first answer explicitly states that catching segfaults is a very bad idea. Also, it is linux only and suggests a platform specific pipe solution, but not in any detail. — toting, Dec 22 '15 at 14:04

Martin Bonner supports Monica · Accepted Answer · 2015-12-22T14:16:07.117

Well, if I were you, I wouldn't start from here ...

However, you are where you are. Yes, you can do it. You are going to have to serialize your input arguments, send them, deserialize them in the child process, run the function, serialize the outputs, return them, and then deserialize them. Boost will have lots of useful code to help with this (see asio).

Global variables will make life much more "interesting". Does the legacy code use Qt? - that probably won't like being split into two processes.

If you were using Windows only, I would say "use DCOM" - it makes this very simple.

Restarting is simple enough if the legacy is only used from one thread (the code which handles "return" just looks to see if it needs to restart, and kills the processes.) If you have multiple threads, then the shiny code will need to check if a restart is required, block any further threads, wait until all calls have returned, restart the process, and then unblock everything.

Boost::interprocess looks to have everything you need for the communication - it's got shared memory, mutexes, and condition variables. Boost::serialization will do the job for marshalling and unmarshalling.

No global variables or Qt on the legacy side, only one or two pure functions are of interest. Are tcp/udp sockets the only portable way? Shared memory and signals would other low level tools that come to mind. — toting, Dec 22 '15 at 14:00
sockets are certainly the most portable way. Boost has a shared memory library so that ought to work. Don't know about signals, they are very different in posix/Windows worlds. You would still need to serialize/deserialize if your arguments/results contain pointers or have virtual functions (and it would probably be a good idea anyway - otherwise version skew is going to cause nightmarish problems). — Martin Bonner supports Monica, Dec 22 '15 at 14:07

Isolate crash prone (SEGV) but speed critical legacy code into a separate binary

1 Answers1