Hmmm, MPI_recv claims to be blocking unless you do something specific to change that. MPI's underlying comms infrastructure is elaborate, and I don't know if 'blocking' extends as far as waiting on a network socket with a call to select(). You're sort of stating that it doesn't, which I can well believe given MPI's complexity.
MPI's Internals
So if MPI_recv in blocking mode inevitably involves polling one needs to work out exactly what the library underneath is doing. Hopefully it's a sensible poll (ie, one involving a call to nanosleep()). You could look at the Open MPI source code for that (eek), or use this and GTKWave to see what its scheduling behaviour is like in a nice graphical way (I'm assuming you're on Linux).
If it is sleeping in the polling loop then the version of the Linux kernel matters. More modern kernels (possibly requiring the PREEMPT_RT patch set - I'm afraid I can't remember) do a proper timer driven de-scheduled sleep even for short periods, so taking no CPU time. Older implementation would just go into a busy loop for short sleeps, which is no good to you.
If it's not sleeping at all then it's going to be harder. You'd have to use MPI in a non-blocking mode and do the polling / sleeping yourself.
Thread Priorities
Once you've got either your or MPI's code polling with a sleep you can then rely on using thread priorities and the OS scheduler to sort things out. In general putting the I/O thread at a higher priority than the worker threads is a good idea. It prevents the process at the other end of the I/O from being blocked by your work threads pre-empting your I/O thread. For this reason sched_yield() isn't a good idea because the scheduler won't put your thread to sleep.
Thread Affinity
In general I wouldn't bother with that, at least not yet. You've 5 threads and 4 cores; one of those threads will always be disappointed. If you let the kernel sort things out as best it can then provided you've got control of the polling (as described above) you should be fine.
--EDIT--
I've gone and had another look at MPI and threads, and re-discovered why I didn't like it. MPI intercommunicates for processes, each of which has a 'rank'. Whilst MPI is/can be thread-safe, a thread in itself doesn't have its own rank. So MPI is not capable of intercommunicating between threads. That's a bit of a weakness in MPI in these days of multi-core devices.
However you could have 4 separate processes and no I/O thread. That's likely to be less than optimal in terms of how much data is copied, moved and stored (it'll be 4x the network traffic, 4x the memory used, etc). However, if you've a large enough compute-time:I/O-time ratio you might be able to stand that inefficiency for the sake of simple source code.