0

People typically mention mfence in the context of lock-free programming with atomics and without system synchronization primitives, which performs request to the kernel. So does it mean that mfence instruction is not needed when I create code for user space that uses kernel synchronization primitives. If so, why?

Also, do int instructions perform implicit mfence?

I will be extremely happy to learn about it, because documentation that I have found in the web regarding int and sysenter specify nothing regarding mfence.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Konstantin Burlachenko
  • 5,233
  • 2
  • 41
  • 40
  • C++ guys are near to metal. So it’s a reason why the tag is here really. But maybe you are right dear @user17732522 actually this question is about x64 assembly. – Konstantin Burlachenko Jan 28 '22 at 22:36
  • Ok, kindly thanks @user17732522 – Konstantin Burlachenko Jan 28 '22 at 22:43
  • 1
    Are you sure you mean `sysenter`? It's only used from 32-bit user-space. The 64-bit system call instruction is `syscall`. – Peter Cordes Jan 29 '22 at 04:34
  • 2
    Anyway, [neither syscall nor sysenter are serializing](https://stackoverflow.com/questions/50323347/how-many-memory-barriers-instructions-does-an-x86-cpu-have), so any memory barriering depends on how the kernel implements whatever system call you invoked. `int` is also not fully serializing, although `iret` is. It is guaranteed like `lfence` to not execute later instructions until all earlier ones are executed: https://www.felixcloutier.com/x86/intn:into:int3:int1 - but not drain the store buffer. – Peter Cordes Jan 29 '22 at 04:35
  • 2
    I would expect that whatever synchronization system call you're using will execute any necessary barriers within its actual kernel code, so that you wouldn't rely on int/syscall/whatever to hand it. If it's some kind of mutex system call, for instance, then the lock had better have an acquire barrier, and the unlock a release barrier (on architectures where that isn't already automatic). If not, hopefully it's very clearly documented that you need your own barriers. One major reason for using such primitives in the first place would be to avoid thinking about memory order at all. – Nate Eldredge Jan 29 '22 at 06:18

1 Answers1

1
  1. Neither SYSCALL instruction nor SYSRET instruction is fully serializing.

    It guarantees that later instructions will not execute until all earlier ones have completed execution, but not drain the store buffer.

    So the memory ordering depends on how the kernel implements whatever system call you invoked.

  2. INT instruction is also not fully serializing, but IRET instruction is.

I really appreciate your help! @Peter Cordes

ioworker0
  • 46
  • 2
  • Was this supposed to be a reply to me? You posted this answer with an @Peter Cordes. I don't understand your phrasing, it confirms exactly what I commented, that it orders instruction execution (like `lfence`) but not memory visibility. How is this "your opinion"? It's a direct quote from the manual. – Peter Cordes Apr 10 '23 at 10:28
  • The only reason I never answered this question is that it asked about `sysenter` in 64-bit mode, where it's basically never used. (Although is apparently supported.) I was expecting the OP to fix their question to ask about `syscall` since they were asking about system calls in existing OSes, but they never did, so it's still a weird question. – Peter Cordes Apr 10 '23 at 10:31
  • @Peter Cordes Thanks for your reply. > Neither SYSCALL nor SYSENTER are serializing. > INT is also not fully serializing. I'm still a little bit confused about it. In my humble opinion: SYSCALL , SYSENTER and INT are not fully serializing. They are guaranteed like lfence. – ioworker0 Apr 12 '23 at 09:55
  • That's correct. So they do basically nothing for *memory* ordering, except for not executing later NT loads early. (And maybe waiting for earlier NT loads from WC memory to produce a value.) x86's memory model for normal loads/stores on WB memory is already quite strong, program-order + a store buffer with store-forwarding. The only meaningful barrier is `mfence` which drains the store buffer before later instructions, or `sfence` which blocks later stores until after all earlier stores including NT stores. SYSCALL and SYSENTER aren't either of those, so they're not *memory* barriers at all – Peter Cordes Apr 12 '23 at 10:02
  • See also [Does the Intel Memory Model make SFENCE and LFENCE redundant?](https://stackoverflow.com/q/32705169) - yes, except for NT loads and stores. Also notice this question you answered asks if they're equivalent to `mfence`, and the answer is "no" according to your quote from the manual. Of course your answer doesn't make any claims about whether it is or isn't a barrier or what that means, so it's not wrong on its own. But I got the impression you were disagreeing with my comment. I don't think I've mis-stated anything in my comments now or earlier on this Q&A. – Peter Cordes Apr 12 '23 at 10:03
  • 1
    @Peter Cordes Thanks. I totally agree with you. ( SYSCALL and SYSENTER aren't memory barriers at all ) I was a little bit confused about it. ( SYSCALL and SYSENTER aren't serializing, or they're not fully serializing. ) – ioworker0 Apr 12 '23 at 10:34