2

I want to know how the Linux kernel disables x86 SMAP when executing the copy_from_user() function. I tried to find something in source code, but I failed.

Supervisor Mode Access Prevention (SMAP) is a security feature of x86 CPUs to prevent the kernel from accessing unintended user-space memory, which helps to fend off various exploits.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
YvG3
  • 35
  • 5
  • I would appreciate it if you can show me the related codes – YvG3 Apr 26 '20 at 12:45
  • Could you please expand the "SMAP" acronym? – KamilCuk Apr 26 '20 at 13:11
  • Supervisor Mode Access Prevention (SMAP) is a security featureof Intel CPUs to prevent the kernel from accessing unintended user-space memory and in turn helping fend off various exploits. – YvG3 Apr 26 '20 at 15:29
  • copy_from_user function copies data from user space to kernel space, but with SMAP protection, in kernel space to access a user space memory is illegal. So when really copies datas, the kernel will temporarily disable SMAP protection so that data can be transfered from user space, and when transfer done, the SMAP protection will be enabled again. – YvG3 Apr 26 '20 at 15:36
  • more infomation can be seen in https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention – YvG3 Apr 26 '20 at 15:45
  • This is a very broad question since the answer depends on the architecture. Are you talking about a specific architecture (e.g. x86)? If so, please state it in the question. If not, then it's definitely too broad to answer: every architecture that supports SMAP has its own way of enabling/disabling it. – Marco Bonelli Apr 27 '20 at 08:37
  • Thanks for your comments, I have updated the description of the problem – YvG3 Apr 27 '20 at 11:12

1 Answers1

8

As documented in the Wikipedia page that you linked:

SMAP is enabled when memory paging is active and the SMAP bit in the CR4 control register is set. SMAP can be temporarily disabled for explicit memory accesses by setting the EFLAGS.AC (Alignment Check) flag. The stac (Set AC Flag) and clac (Clear AC Flag) instructions can be used to easily set or clear the flag.

The Linux kernel does exactly this to temporarily disable SMAP: it uses stac to set EFLAGS.AC before copying the data, and then uses clac to clear EFLAGS.AC when done.

The AC flag has existed since 486 as alignment check for user-space load/store; SMAP overloads the meaning of that flag bit. stac/clac are new with SMAP and are only allowed in kernel mode (CPL=0); they fault in user-space (and on CPUs without SMAP, also in kernel mode).


In theory it's pretty simple, but in practice the Linux kernel codebase is a jungle of functions, macros, inline assembly templates, etc. To find out exactly how this is done we can look at the source code, starting from copy_from_user():

  1. When copy_from_user() is called, it makes a quick check to see if the memory range is valid, then calls _copy_from_user()...

  2. ... which does another couple of checks and then calls raw_copy_from_user()...

  3. ... which, before doing the actual copy, calls __uaccess_begin_nospec()...

  4. ... which is just a macro that expands to stac(); barrier_nospec().

  5. Focusing on stac(), which is a simple inline function, we have:

     alternative("", __ASM_STAC, X86_FEATURE_SMAP);
    

The alternative() macro is a pretty complicated macro for selecting alternatives for an instruction at kernel boot time, based on CPU support. You can check the source file in which it is defined for a bit more information. In this case it is used to decide whether the kernel needs to use the stac instruction or not, based on CPU support (old x86 CPUs do not have SMAP available, and therefore don't have the instruction: on those CPUs this just becomes a no-op).

Looking at the __ASM_STAC macro we see:

#define __ASM_STAC  ".byte 0x0f,0x01,0xcb"

Which is the assembled stac opcode in bytes. This is defined with the .byte directive instead of the mnemonic because, again, this needs to compile even on old toolchains where the version of binutils doesn't know about those instructions.

Once at boot, the cpuid instruction is used to check for X86_FEATURE_SMAP (bit 20 of ebx when cpuid is executed with eax=7, ecx=0 to get the extended features), and this tells the kernel whether SMAP is available (rewrite the machine code to make the instruction become stac) or not (keeping a no-op).

Once done with all of this madness (which really all just boils down to a single instruction), the actual copy from user memory is performed, and the __uaccess_end() macro is then used to re-enable SMAP. This macro uses alternative() in the same way as the one we just saw, and ends up executing clac (or a nop).

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • You edited out the "for some reason" comment on `stac` / `clac` only being available on CPL0. It does bear comment since you can do the same thing in any mode with `pushf`/`popf`. If I had to guess, it's for performance, so `stac` can just toggle permission checking if SMAP's enabled, never alignment checking. (The alignment check effect of AC only applies to user-space, CPL3). IDK if it or popf serialize loads/stores; they might if the SMAP status is renamed. – Peter Cordes Feb 08 '22 at 17:22