1

I'm currently working on a project running on a heavily modified version of Linux patched to be able to access a VMEbus. Most of the bus-handling is done, I have a VMEAccess class that uses mmap to write at a specific address of /dev/mem so a driver can pull that data and push it onto the bus.

When the program starts, it has no idea where the slave board it's looking for is located on the bus so it must find it by poking around: it tries to read every address one by one, if a device is connected there the read method returns some data but if there isn't anything connected a SIGBUS signal will be sent to the program.

I tried several solutions (mostly using signal handling) but after some time, I decided on using jumps. The first longjmp() call works fine but the second call to VMEAccess::readWord() gives me a Bus Error even though my handler should prevent the program from crashing.

Here's my code:

#include <iostream>
#include <string>
#include <sstream>
#include <csignal>
#include <cstdlib>
#include <csignal>
#include <csetjmp>

#include "types.h"
#include "VME_access.h"

VMEAccess *busVME;

int main(int argc, char const *argv[]);
void catch_sigbus (int sig);
void exit_function(int sig);

volatile BOOL bus_error;
volatile UDWORD offset;
jmp_buf env;

int main(int argc, char const *argv[])
{
    sigemptyset(&sigBusHandler.sa_mask);

    struct sigaction sigIntHandler;

    sigIntHandler.sa_handler = exit_function;
    sigemptyset(&sigIntHandler.sa_mask);
    sigIntHandler.sa_flags = 0;

    sigaction(SIGINT, &sigIntHandler, NULL);

    /*   */
    struct sigaction sigBusHandler;

    sigBusHandler.sa_handler = catch_sigbus;
    sigemptyset(&sigBusHandler.sa_mask);
    sigBusHandler.sa_flags = 0;

    sigaction(SIGBUS, &sigBusHandler, NULL);

    busVME = new VMEAccess(VME_SHORT);

    offset = 0x01FE;

    setjmp(env);
    printf("%d\n", sigismember(&sigBusHandler.sa_mask, SIGBUS));

    busVME->readWord(offset);
    sleep(1);

    printf("%#08x\n", offset+0xC1000000);

    return 0;
}

void catch_sigbus (int sig)
{
    offset++;
    printf("%#08x\n", offset);
    longjmp(env, 1);
}

void exit_function(int sig) 
{
    delete busVME;
    exit(0);
}
Tzig
  • 726
  • 5
  • 20
  • Unfortunatly I can't, the SIGBUS is the defined behavior in the case, since I'm trying to read from a board that doesn't exist I get this error, it can't be avoided, at least not without creating my own drivers/kernel for the FPGA – Tzig Dec 18 '17 at 11:05
  • [Don't `longjmp` out of a signal handler](https://stackoverflow.com/questions/7334595/longjmp-out-of-signal-handler). Instead use `sigsetjmp` and `siglongjmp`. See [the manual page](http://man7.org/linux/man-pages/man3/longjmp.3.html) for more information. – Some programmer dude Dec 18 '17 at 11:17
  • Whoa ! Thanks it worked ! It was actually pretty easy to solve, thank you very much, can you use the actual answer button so I can mark it solved ? – Tzig Dec 18 '17 at 13:39

1 Answers1

0

As mentioned in the comments, using longjmp in a signal handler is a bad idea. After doing the jump out of a signal handler your program is effectively still in the signal handler. So calling non-async-signal-safe functions leads to undefined behavior for example. Using siglongjmp won't really help here, quoting man signal-safety:

If a signal handler interrupts the execution of an unsafe function, and the handler terminates via a call to longjmp(3) or siglongjmp(3) and the program subsequently calls an unsafe function, then the behavior of the program is undefined.

And just for example, this (siglongjmp) did cause some problems in libcurl code in the past, see here: error: longjmp causes uninitialized stack frame

I'd suggest to use a regular loop and modify the exit condition in the signal handler (you modify the offset there anyway) instead. Something like the following (pseudo-code):

int had_sigbus = 0;

int main(int argc, char const *argv[])
{
    ...
    for (offset = 0x01FE; offset is sane; ++offset) {
        had_sigbus = 0;
        probe(offset);
        if (!had_sigbus) {
            // found
            break;
        }
    }
    ...
}

void catch_sigbus(int)
{
    had_sigbus = 1;
}

This way it's immediately obvious that there is a loop, and the whole logic is much easier to follow. And there are no jumps, so it should work for more than one probe :) But obviously probe() must handle the failed call (the one interrupted with SIGBUS) internally too - and probably return an error. If it does return an error using the had_sigbus function might be not necessary at all.

dvk
  • 1,420
  • 10
  • 13
  • I already tried something like this actually but for some reason when the program probes the right address, it doesn't exit the loop. Although when I start the program with the right value in offset it will exit. Weird – Tzig Dec 18 '17 at 14:08
  • You sure you haven't forgot to set the flag to 0 before *each* probe? – dvk Dec 18 '17 at 14:09
  • I am yes, and when I try to see what's going on using cout or printf nothing prints, even after a fflush – Tzig Dec 18 '17 at 14:32
  • Sounds like it gets stuck somewhere in the probing function. It's hard to tell what might have gone wrong, but I suspect not-quite-correct error handling, since it's the difference here. If you jump from the signal the function that probes the bus just stops execution when SIGBUS is sent. With the flag approach some syscall will return an error, and the control will be passed back to the function - and it should handle that error correctly. – dvk Dec 18 '17 at 14:48
  • But the probing function actually rely on a function from the kernel (and I can't make any change there) so I think I'm "stuck" with siglongjmp (but since I code everything else and I don't plan on using any mutex or anything else it should be okay) – Tzig Dec 18 '17 at 14:52