97

I've been trying to understand assembly language code generated by GCC and frequently encounter this instruction at the start of many functions including _start(), but couldn't find any guide explaining its purpose:

31-0000000000001040 <_start>:
32:    1040:    f3 0f 1e fa             endbr64 
33-    1044:    31 ed                   xor    ebp,ebp
janw
  • 8,758
  • 11
  • 40
  • 62
Mah35h
  • 1,127
  • 1
  • 7
  • 18
  • 6
    See [this pdf from intel](https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf) – Jester Jul 05 '19 at 15:41
  • 1
    You'll typically only find that in code like `_start` that was already in machine-code form which gcc *linked* into your executable (from `crt0.o` or whatever), not which gcc emitted from C source. – Peter Cordes Jul 06 '19 at 01:17
  • (Unless your GCC is configured with `-fcf-protection=branch` as the default, or you use that manually. See https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html and `-mmanual-endbr` in https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) – Peter Cordes Jun 30 '21 at 17:38

2 Answers2

84

It stands for "End Branch 64 bit" (there is also a 32 bit counter part) -- or more precisely, Terminate Indirect Branch in 64 bit.

Here is the operation:

IF EndbranchEnabled(CPL) & EFER.LMA = 1 & CS.L = 1
  IF CPL = 3
  THEN
    IA32_U_CET.TRACKER = IDLE
    IA32_U_CET.SUPPRESS = 0
  ELSE
    IA32_S_CET.TRACKER = IDLE
    IA32_S_CET.SUPPRESS = 0
  FI
FI;

If the instruction can't clear the TRACKER flag, then the CPU generates a #CP exception. In other words, if a hacker was able to change the destination address of an indirect jump, the program is very likely to terminate even if the destination is legal assembly code.

The instruction is otherwise considered a NOP.

In other woerds, the CET feature is used to make sure that your indirect branches actually go to a valid location. This allows for additional safety. Here is the paragraph from Intel about it:

The ENDBRANCH (see Section 73 for details) is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. This instruction opcode is selected to be one that is a NOP on legacy machines such that programs compiled with ENDBRANCH new instruction continue to function on old machines without the CET enforcement. On processors that support CET the ENDBRANCH is still a NOP and is primarily used as a marker instruction by the processor pipeline to detect control flow violations. The CPU implements a state machine that tracks indirect jmp and call instructions. When one of these instructions is seen, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state. In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH. If an ENDBRANCH is not seen the processor causes a control protection exception (#CP), else the state machine moves back to IDLE state.


As a side note, it is possible to tell the processor to allow for no ENDBR64. This is done with a prefix (3Eh). This is useful for cases such as a switch where the addresses are in a table located in read-only memory. However, the CPU ignores that prefix in many cases.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • Address of **11th Generation Intel® Core™ Processor Datasheet, Volume 1 of 2 June 2021 Revision 006** mentioned in the answer does not work. A link valid for this month is https://cdrdv2.intel.com/v1/dl/getContent/631121 – vitsoft Jun 30 '21 at 07:17
  • @vitsoft Interestingly enough, the docs you referenced have the link I had here, which is indeed dead. Since I had a copy of the paragraph explaining the instruction, I think I'm okay without a link... – Alexis Wilke Jun 30 '21 at 16:04
71

endbr64 (and endbr32) are a part of Intel's Control-Flow Enforcement Technology (CET) (see also Intel Software Developer Manual, Volume 1, Chapter 18).

Intel CET offers hardware protection against Return-oriented Programming (ROP) and Jump/Call-oriented Programming (JOP/COP) attacks, which manipulate control flow in order to re-use existing code for malicious purposes.

Its two major features are

  • a shadow stack for tracking return addresses and
  • indirect branch tracking, which endbr64 is a part of.

While CET is just slowly becoming available with the current processor generation, it is already supported as of GCC 8, which inserts endbrXX instructions by default. The opcode is chosen to be a no-op on older processors, such that the instruction is ignored if CET is not supported; the same happens on CET-capable processors where indirect branch tracking is disabled.


So what does endbr64 do?

Preconditions:

  • CET must be enabled by setting the control register flag CR4.CET to 1.
  • The appropriate flags for indirect branch tracking in the IA32_U_CET (user mode) or IA32_S_CET (supervisor mode) MSRs are set.

The CPU sets up a small state machine which tracks the type of the last branch. Take the following example:

some_function:
    mov rax, qword [vtable+8]
    call rax
    ...

check_login:
    endbr64
    ...
authenticated:
    mov byte [is_admin], 1
    ...
    ret

Let's now briefly look at two scenarios.

No attack:

  1. some_function retrieves the address of the virtual method check_login from the virtual method table vtable and calls it.
  2. Since this is an indirect call, the CET state machine is activated and set to trigger on the next instruction (TRACKER = WAIT_FOR_ENDBRANCH).
  3. The next instruction is endbr64, so the indirect call is considered "safe" and execution continues (the endbr64 still behaves as a no-op). The state machine is reset (TRACKER = IDLE).

Attack:
An attacker somehow managed to manipulate vtable such that vtable+8 now points to authenticated.

  1. some_function retrieves the address of authenticated from the virtual method table vtable and calls it.
  2. Since this is an indirect call, the CET state machine is activated and set to trigger on the next instruction (TRACKER = WAIT_FOR_ENDBRANCH).
  3. The next instruction is mov byte [is_admin], 1, not the expected endbr64 instruction. The CET state machine infers that control flow was manipulated and raises a #CP fault, terminating the program.

Without CET, the control flow manipulation would have gone unnoticed and the attacker would have obtained admin privileges.


In summary, the indirect branch tracking feature of Intel CET ensures that indirect calls and jumps can only redirect to functions which start with an endbr64 instruction.

Note that this does not ensure that the right function is called - if an attacker changes control flow to jump to a different function which starts with endbr64 as well, the state machine won't complain and keep executing the program. However, this still greatly reduces the attack surface, as most JOP/COP attacks target instructions mid-function (or even jump right "into" instructions).

janw
  • 8,758
  • 11
  • 40
  • 62
  • its possible to set CR4.CET from ring 3, or it need support from kernel? – Mah35h Sep 17 '21 at 16:05
  • 3
    `CR4` can only be changed in ring 0, thus kernel support is necessary. – janw Sep 17 '21 at 16:12
  • How does this work if e.g. an interrupt or page fault happens between the indirect branch and the endbr64? – TLW Dec 08 '22 at 08:07
  • 1
    Branch/jump/call are LOAD instructions -- they do not retire until the code to be jumped to is paged in, checked for page_execute permission, – William Cushing Feb 02 '23 at 10:05
  • @WilliamCushing - I am confused, because a series of back-to-back jumps (without CET) does not prevent interrupts. – TLW Apr 29 '23 at 19:01
  • Consider the following two sequences. A: 1.`call foo` instruction executes. `foo` contains a valid `endbr64` at this point in time. 2. interrupt or page fault that swaps out the page containing `foo` with a page that does not have `endbr64`. 3. return from interrupt. B: 1.`nop` instruction at the end of the page before `foo` executes. `foo` contains a valid `endbr64` at this point in time. 2. interrupt or page fault that swaps out the page containing `foo` with a page that does not have `endbr64`. 3. return from interrupt. – TLW Apr 29 '23 at 19:07
  • 2
    I don't see how the behavior can be different between these two scenarios without an additional bit of state that the OS has to save and restore. – TLW Apr 29 '23 at 19:07
  • (And yes, I know that `call foo` isn't indirect. Assume that's actually an indirect branch that happens to target `foo`.) – TLW Apr 29 '23 at 19:10
  • 1
    @TLW the OS does need to save state in the scenario you described. The Intel manual describes that the state of the CET including the state of the indirect branch tracker and the shadow stack pointer can saved using XSAVE and XSTORE. Specifically, the tracker bits are in the IA32_U_CET MSR. – Dougvj Jul 16 '23 at 13:07