2

I'm trying to execute "invd" instruction from a kernel module. I have asked a similar question How to execute “invd” instruction? previously and from @Peter Cordes's answer, I understand I can't safely run this instruction on SMP system after system boot. So, shouldn't I be able to run this instruction after boot without SMP support? Because there is no other core running, therefore there is no change for memory inconsistency? I have the following kernel module compiled with -o0 flag,

static int __init deviceDriver_init(void){

unsigned long flags;
int LEN=10;
int STEP=1;
int VALUE=1;
int arr[LEN];
int i;
unsigned long dummy;

printk(KERN_INFO "invd Driver loaded\n");

//wbinvd();
//asm volatile("cpuid\n":::);

local_irq_disable();

__asm__ __volatile__(

    "wbinvd\n"
    "loop:"
    "movq %%rdx, (%%rbx);"
    "leaq (%%rbx, %%rcx, 8), %%rbx;"
    "cmpq %%rbx, %%rax;"
    "jg loop;"

    "invd\n"
    : "=b"(dummy) // output
    : "b" (arr),
      "a" (arr+LEN),
      "c" (STEP),
      "d" (VALUE)
    : "cc", "memory"
);

local_irq_enable();

//asm volatile("invd\n":::);

printk(KERN_INFO "invd execute\n"); 

return 0; 
}

I'm still getting the following error upon inserting the module I'm getting Segmentation fault (core dumped) in the terminal and the dmesg shows,

[ 2590.518614] invd Driver loaded
[ 2590.518840] general protection fault: 0000 [#5] SMP PTI

I have boot my kernel with nosmp but I do not understand why dmesg still shows SMP PTI

$cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.15.0-136-generic root=UUID=dbe747ff-a6a5-45cb-8553-c6db6d445d3d ro quiet splash nosmp vt.handoff=7

Update post:

As I mentioned in the comment section, After disabling, SGX from BIOS, I was able to run this invd without any error. However, when I try to run the same code on a different machine with the same kernel version, I still get the same error message. It is strange and I can't explain why this is happening. As in the comment section, @prl mentions that the error may be coming from the instruction following invd. I begin to think that maybe that is true. Because second from the last line in the dmesg is higlighted in RED [ 153.527386] RIP: loop+0xc/0xf22 [noSmp8] RSP: ffffb8d9450a7be0. So, seems like the error is coming from inside the loop. I have updated the __init function code according to the suggestion. I'm not good at assembly code, can anyone please tell me if the inline assembly code is correct or not? If this inline assembly code is not correct how to fix the code? My whole dmesg trace is,

[  153.514293] invd Driver loaded
[  153.514547] general protection fault: 0000 [#1] SMP PTI
[  153.514656] Modules linked in: noSmp8(OE+) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables ccm arc4 intel_rapl rt2800usb rt2x00usb x86_pkg_temp_thermal intel_powerclamp rt2800lib coretemp rt2x00lib mac80211 cfg80211 kvm_intel kvm irqbypass snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf dell_smm_hwmon dell_wmi dell_smbios dcdbas intel_wmi_thunderbolt snd_hda_codec_generic dell_wmi_descriptor wmi_bmof snd_seq_midi snd_seq_midi_event
[  153.515454]  serio_raw snd_hda_intel snd_hda_codec snd_hda_core sparse_keymap snd_hwdep snd_rawmidi joydev input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore mei_me mei shpchp intel_pch_thermal mac_hid acpi_pad parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt igb e1000e dca i2c_algo_bit ptp pps_core ahci libahci fb_sys_fops drm wmi video
[  153.516038] CPU: 0 PID: 4024 Comm: insmod Tainted: G           OE    4.15.0-136-generic #140~16.04.1-Ubuntu
[  153.516331] Hardware name: Dell Inc. BIOS 1.3.2 01/25/2016
[  153.516626] RIP: 0010:loop+0xc/0xf22 [noSmp8]
[  153.516917] RSP: 0018:ffffb8d9450a7be0 EFLAGS: 00010046
[  153.517213] RAX: ffffb8d9450a7c08 RBX: ffffb8d9450a7c08 RCX: 0000000000000001
[  153.517513] RDX: 0000000000000001 RSI: ffffb8d9450a7be0 RDI: ffff8edaadc16490
[  153.517814] RBP: ffffb8d9450a7c60 R08: 0000000000012c40 R09: ffffffffb39624c4
[  153.518119] R10: ffffb8d9450a7c78 R11: 000000000000038c R12: ffffb8d9450a7c10
[  153.518427] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8eda4c6bd660
[  153.518730] FS:  00007fd7f09cf700(0000) GS:ffff8edaadc00000(0000) knlGS:0000000000000000
[  153.519036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  153.519346] CR2: 00005634f95fde50 CR3: 000000040dd2c001 CR4: 00000000003606f0
[  153.519656] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  153.519980] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  153.520289] Call Trace:
[  153.520597]  ? 0xffffffffc050d000
[  153.520899]  do_one_initcall+0x55/0x1ac
[  153.521201]  ? do_one_initcall+0x55/0x1ac
[  153.521504]  ? do_init_module+0x27/0x223
[  153.521808]  ? _cond_resched+0x32/0x50
[  153.522107]  ? kmem_cache_alloc_trace+0x165/0x1c0
[  153.522408]  do_init_module+0x5f/0x223
[  153.522710]  load_module+0x188c/0x1ea0
[  153.523016]  ? ima_post_read_file+0x83/0xa0
[  153.523320]  SYSC_finit_module+0xe5/0x120
[  153.523623]  ? SYSC_finit_module+0xe5/0x120
[  153.523927]  SyS_finit_module+0xe/0x10
[  153.524231]  do_syscall_64+0x73/0x130
[  153.524534]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[  153.524838] RIP: 0033:0x7fd7f04fd599
[  153.525144] RSP: 002b:00007ffda61c2968 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[  153.525455] RAX: ffffffffffffffda RBX: 00005643631d7210 RCX: 00007fd7f04fd599
[  153.525768] RDX: 0000000000000000 RSI: 0000564361c3226b RDI: 0000000000000003
[  153.526084] RBP: 0000564361c3226b R08: 0000000000000000 R09: 00007fd7f07c2ea0
[  153.526403] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
[  153.526722] R13: 00005643631d7ca0 R14: 0000000000000000 R15: 0000000000000000
[  153.527040] Code: 00 48 8b 75 c8 48 8b 45 c8 8b 55 b8 48 63 d2 48 c1 e2 02 48 01 d0 8b 4d b4 8b 55 bc 48 89 f3 48 89 13 48 8d 1c cb 48 39 d8 7f f4 <0f> 08 48 89 d8 48 89 45 d0 e8 40 ef 73 00 48 c7 c7 c7 d0 c4 c0 
[  153.527386] RIP: loop+0xc/0xf22 [noSmp8] RSP: ffffb8d9450a7be0
[  153.530228] ---[ end trace cc9ea64985c9fe34 ]---

So, it not possible to run invd even without SMP?

user45698746
  • 305
  • 2
  • 13
  • What instruction is the GP occurring on? I don’t think it’s the INVD. – prl Mar 13 '21 at 21:59
  • Your array contains 4 byte ints, but you’re writing it 8 bytes at a time. I don’t think that’s the bug, because the length of the array is even. – prl Mar 13 '21 at 22:00
  • Here’s one bug: rbx is changed by the assembly code, but that isn’t indicated in the constraints. You need a dummy output for rbx. – prl Mar 13 '21 at 22:20
  • @prl, by commenting out "invd". The module works without throwing any error – user45698746 Mar 13 '21 at 22:21
  • I’m sorry, I was unclear. I didn’t mean that the INVD isn’t the cause, just that the GP occurs on some instruction following the INVD. – prl Mar 13 '21 at 22:28
  • @prl, can you please tell me how can I add a dummy output? I tried something like this `"e=" (rbx)` and I'm getting `error: invalid lvalue in asm output 0` – user45698746 Mar 13 '21 at 22:33
  • Declare `unsigned long dummy;` and then use `“=b”(dummy)`. – prl Mar 13 '21 at 22:35
  • @prl. Thanks. I have added the `dummy` output value. However, the problem `general protection fault: 0000 [#8] SMP PTI` persists. – user45698746 Mar 13 '21 at 22:38
  • `<0f> 08` has the `<>` highlight of current/fault-address RIP on the `0F 08`, which is the machine code for `invd` (https://www.felixcloutier.com/x86/invd). So yes, INVD itself is faulting in that test setup. The `7f f4` preceding it is your `jg rel8` with a small negative displacement, i.e. `jg loop`. – Peter Cordes Mar 14 '21 at 06:10
  • @PeterCordes, ok. Then how this same code ran on a different machine? strange – user45698746 Mar 14 '21 at 06:15
  • 1
    If it's faulting in kernel mode, sounds like SGX didn't really get disabled, or there's some other mechanism like maybe SMM that can stop it from working. Although https://www.felixcloutier.com/x86/invd only mentions *If the processor reserved memory protections are activated.* as a possible fault reason other than privilege. So I'd guess the firmware on that other machine *does* leave some HW memory protection activated. – Peter Cordes Mar 14 '21 at 06:25
  • Related: [How to explicitly load a structure into L1d cache? Weird results with INVD with CR0.CD = 1 on isolated core with/without hyperthreading](https://stackoverflow.com/q/66772632) was a followup to this, isolating a core. – Peter Cordes Mar 05 '22 at 01:55

1 Answers1

4

There's 2 questions here:

a) How to execute INVD (unsafely)

For this, you need to be running at CPL=0, and you have to make sure the CPU isn't using any "processor reserved memory protections" which are part of Intel's Software Guard Extensions (an extension to allow programs to have a shielded/private/encrypted space that the OS can't tamper with, often used for digital rights management schemes but possibly usable for enhancing security/confidentiality of other things).

Note that SGX is supported in recent versions of Linux, but I'm not sure when support was introduced or how old your kernel is, or if it's enabled/disabled.

If either of these isn't true (e.g. you're at CPL=3 or there are "processor reserved memory protections) you will get a general protection fault exception.

b) How to execute INVD Safely

For this, you have to make sure that the caches (which includes "external caches" - e.g. possibly including things like eDRAM and caches built into non-volatile RAM) don't contain any modified data that will cause problems if lost. This includes data from:

  • IRQs. These can be disabled.

  • NMI and machine check exceptions. For a running OS it's mostly impossible to stop/disable these and if you can disable them then it's like crossing your fingers while ignoring critical hardware failures (an extremely bad idea).

  • the firmware's System Management Mode. This is a special CPU mode the firmware uses for various things (e.g. ECC scrubbing, some power management, emulation of legacy devices) that't beyond the control of the OS/kernel. It can't be disabled.

  • writes done by the CPU itself. This includes updating the accessed/dirty flags in page tables (which can not be disabled), plus any performance monitoring or debugging features that store data in memory (which can be "not enabled").

With these restrictions (and not forgetting the performance problems) there are only 2 cases where INVD might be sane - early firmware code that needs to determine RAM chip sizes and configure memory controllers (where it's very likely to be useful/sane), and the instant before the computer is turned off (where it's likely to be pointless).

Guesswork

I'm guessing (based on my inability to think of any other plausible reason) that you want to construct temporary shielded/private area of memory (to enhance security - e.g. so that the data you put in that area won't/can't leak into RAM). In this case (ironically) it's possible that the tool designed specifically for this job (SGX) is preventing you from doing it badly.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • just asking, is `INVD` and `INVLD` is the service the same purpose? I'm a noob here. – user45698746 Mar 13 '21 at 22:42
  • @user45698746: Oops - I misremembered the instruction name (fixing, thanks). – Brendan Mar 13 '21 at 22:44
  • 1
    @Brendan, thanks. Seems like, SGX option in my BIOS was enabled. I disabled it. and I have also disabled interrupt before the loop. It is working now. Thanks !! – user45698746 Mar 13 '21 at 22:53
  • 2
    Wow, good call, Brendan. I had guessed that the use of SGX was sufficiently rare that it was unlikely to be a factor. – prl Mar 14 '21 at 00:51
  • 1
    As I commented on the OP's previous question, `memset(0)` over this temp buffer (before leaving no-fill mode, if you entered that mode at all) should give equal security benefit without needing `invd` to prevent data from propagating back into memory. Although I guess without no-fill mode, writing some lines in the buffer that you hadn't originally touched could end up evicting some dirty lines to RAM where they'd be present momentarily. (memset(0) + wbinvd could make sure that interval is short, or use a 2nd pass with NT stores.) – Peter Cordes Mar 14 '21 at 01:43
  • Hello again. Is there a way to check if `invd` is actually working? In my code, running `invd` is not throwing any `Protection fault`, however, I'm not seeing my desirable results which makes me confused whether `invd` is really working or not. – user45698746 Mar 23 '21 at 18:56
  • @Brendan, your guess is absolutely correct. That is exactly what I'm trying to do. – user45698746 Mar 23 '21 at 19:00