I need to see more of your code, but your statement that "All other threads simply block all signals" raises...signals.
You have to remember that most system calls were created before the concept of threads existed. Signal handling is one of them. Thus, when you block a signal on ANY thread, it's likely blocked for ALL threads.
In fact, check out the signal(2) manpage:
The effects of signal() in a multithreaded process are unspecified.
Yes, this is sad, but it is the price you must pay for using a low-overhead statistical sampling profiler. And working around it is very easy: just remove SIGPROF (or SIGALRM if you are using the REAL mode) from your signal mask set and you should be fine.
And in general, unless you absolutely have to, you should not be doing process-level signal masking in anything other than the main thread...where "main" doesn't necessarily mean the thread that MAIN() is running in, but rather, the thread you consider the "boss" of all the others, for reasons you have made all too clear already. :)
You can also try using the pthread library's sigmask wrapper pthread_sigmask, but it is unclear to me how well it works in situations such as a child thread REMOVING an entry from a sigmask (pthreads inherit their parent's pthread sigmask).