A memory barrier (also known as a fence) is a hardware operation, which
ensures the ordering of different reads and writes to the globally
visible store. On a typical modern processor, memory accesses are
pipelined, and may occur out of order. A memory barrier ensures that
this doesn't happen. A full memory barrier will ensure that all loads
and stores which precede it occur before any load or store which follows
it. (Many processors have support partial barriers; e.g. on a Sparc, a
membar #StoreStore
ensures that all stores which occur before it will
be visible to all other processes before any store which occurs after
it.)
That's all a memory barrier does. It doesn't block the thread, or
anything.
Mutexes and semaphores are higher level primatives, implemented in the
operating system. A thread which requests a mutex lock will block, and
have its execution suspended by the OS, until that mutex is free. The
kernel code in the OS will contain memory barrier instructions in order
to implement a mutex, but it does much more; a memory barrier
instruction will suspend the hardware execution (all threads) until the
necessary conditions have been met—a microsecond or so at the
most, and the entire processor stops for this time. When you try to
lock a mutex, and another thread already has it, the OS will suspend
your thread (and only your thread—the processor continues to
execute other threads) until whoever holds the mutex frees it, which
could be seconds, minutes or even days. (Of course, if it's more than a
few hundred milliseconds, it's probably a bug.)
Finally, there's not really much difference between semaphores and
mutexes; a mutex can be considered a semaphore with a count of one.