Perl5: is there a way to recover from a "Memory fault(coredump)" error?

Question

I'm trying to connecting to some DB and get some data. and the requirement is: if the DB connection fails, I will fall back to some file-based approach.

The problem however is: every time the DB connection fails, the program ends up with an "Memory fault(coredump)" error, and it won't continue to the file-based approach.

I've tried "eval {} or do {};" which doesn't work either (i.e. stops at the Memory fault error). Any idea about how can I recover from a "Memory fault(coredump)"? Thanks so much.

So basically, here's the problematic code:

 my $db =
 "dbi:Sybase:server=$dbserver;database=$dbname;kerberos=$dbKerbPrincipal";
 my $dbh = DBI->connect($db, '', ''); 
 # BANG! if $dbname doesn't exist, you get the Memory fault,
 # and can't recover from it.

So I explicitly chose a wrong DB name, just to fail the DB connection and then verify the following-on "fail over" functionality. But I can't get recovered from the above "DBI->connect()" call.

"When your code has undefined behavior, you get a seg fault and corrupted data. When Jeff Dean's code has undefined behavior, a unicorn rides in on a rainbow and gives everybody free ice cream." — Mr. Developerdude, Sep 06 '13 at 15:54

score 4 · Answer 1 · answered Jan 11 '11 at 05:42

Perl + DBI + DBD::Sybase should not be crashing. Period.

You should report the problem on the correct forum - the dbi-users@perl.org mailing list. When you report the problem, you must quote versions - of the operating system, Perl, DBI, DBD::Sybase, the Sybase libraries, and the Sybase server (yes, you need to know the versions of lots of things when working with Perl and DBI).

Are you able to connect reliably to your DB server with regular Sybase programs from the same machine as where your Perl is failing? Do you connect with the Kerberos connections?

That information will probably be critical. It is conceivable that you are testing something which has not been tested before; I'm not sure how widespread Kerberos-authenticated connections are.

was a problem with the Perl version, 5.8.4 will cause this problem. — Tumer, Jan 11 '11 at 09:14
@Tumer: Interesting. I've not often seen Perl crash like that, ever. As with most software, the later versions are usually better than older versions, and 5.8.4 is now fairly old (April 2004?). Also, reporting version information is crucial with DBI and DBD problems, even when the source actually is Perl. — Jonathan Leffler, Jan 11 '11 at 14:17

DVK · Accepted Answer · 2011-01-11T15:32:39.347

2

First of all, I agree with the previous posters and commenters - normally, you shouldn't be getting segfaults from this situation, and a proper fix is figuring out why it segfaults and correcting (could very well be a library bug as Jonathan said).

HOWEVER, to answer your direct technical question, you can NOT recover a Perl interpreter (or any other program for that mater) once it segfaults. You wouldn't WANT to anyway - it is likely in a messy state.

Therefore you solve your problem via a "backup" mechanism of a second process ready to pick up the shattered pieces and make sure that the show goes on, as follows:

Just prior to attempted DB connection, you fork off a child process.

Please note that you may need to daemonize the child process right after forking, as the parents death can kill the child. It's ~2am so my brain's too asleep to be sure whether it's the case or not. If that's the case, daemonization was covered numerous times on SO.
The parent process attempts the DB connection, timed out by a standard Perl alarm (set to, say 10 seconds or whatever you want).
If the connection was successful, it sends a signal to a child to indicate that DB connection is OK, and the parent code can proceed as usual.

If the connection fails, the parent either coredumps, or issues a die assuming the connection failed in non-coredump way or timed out.
The child process, right after forking, sets the signal handler to catch the parent's "DB connection successful" signal, and then sets the alarm for 10+epsilon seconds.
If the child process catches the parent's "DB connection successful" signal, the child process dies.
On the other hand, if the child process alarm handler is triggered, it means the parent couldn't connect to DB. Therefore, the child process sends a signal to parent "I am about to do the fail-over"; and then execute the fail-over logic as needed

The above logic is very rough, it may need to be refined to deal with race conditions, perhaps by virtue of a lock obtained by both processes before using the data.

UPDATE

As noted in the comments, a variation of the above where the CHILD process tries to establish a DB connection and the parent process catches problems and acts as a fall-back may be somewhat easier to implement.

edited Jan 11 '11 at 15:32

answered Jan 11 '11 at 06:53

DVK

126,886
32
213
327

Chaining off into child processes would confuse any monitoring scripts watching for the parent's death. It's better to fork off a child responsible for proxying requests to the database. – bdonlan Jan 11 '11 at 07:07
Incidentally, there's code on CPAN that could be adapted for this already: http://search.cpan.org/~timb/DBI-1.616/lib/DBI/ProxyServer.pm – bdonlan Jan 11 '11 at 07:08
thanks for the reply. It was a problem with the special Perl version I was using: 5.8.4 will cause this problem. Switching to 5.10 solves this problem. but interestingly, some colleague also suggested me to build a wrapper around the original script. – Tumer Jan 11 '11 at 09:17
Meh. Sore spot there. I'm on 5.8. And likely to stay that way for a while :( – DVK Jan 11 '11 at 12:05
To second bdonlan, it's much easier for a parent script to `waitpid` on the child and check `$?` for a core dump that the other way around. Since it is expected on UNIX like systems for the parent to check the exit status of the child it make sense to use the common pattern. – Ven'Tatsu Jan 11 '11 at 14:58
Makes sense. My idea behind the opposite flow was that the parent process be the one that remains in normal flow, thus keeping the same PID (which is usually useless anyway so it's not really a win). – DVK Jan 11 '11 at 15:13
Actually, even though most people agree that this is not recommended practice, it actually IS possible in some languages. Please look at this for complete code in C: http://stackoverflow.com/questions/8401689/best-practices-for-recovering-from-a-segmentation-fault/12824168#12824168 Summary: The way to do it is to usse setjmp() and longjmp() from a signal handler. – Mr. Developerdude Sep 06 '13 at 01:14
@LennartRolland - nice info! Not really applicable here since it's Perl interpreter that seffaults, but good to know for C! – DVK Sep 06 '13 at 02:13
Another option springs to mind. I used that while working on a custom webserver written in C++ while it was in "production". I made a small script that launched the program under GDB. When it crashes, instead of coredumping it will trap in GDB giving you access to the running state. For my webserver effort this was very useful. I guess a similar "lean mean wrapping trapping watchdog executable" could be made to take control over the interpreter when it fails. – Mr. Developerdude Sep 06 '13 at 13:42

Perl5: is there a way to recover from a "Memory fault(coredump)" error?

2 Answers2