5

Is it allowed for a single access to span the bounary between 0 and 0xFFFFFF... in x861?

For example given that eax (rax in 64-bit) is zero, is the following access allowed:

mov ebx, DWORD [eax - 2]

I'm interested in both x86 (32-bit) and x86-64 in case the answers are different.


1 Of course given that the region is mapped in your process etc.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • That would require the zero page to be mapped, which would presumably be troubling... – Oliver Charlesworth Dec 07 '17 at 19:31
  • 1
    @OliverCharlesworth: some users configure their machine to [make that possible for WINE](https://wiki.winehq.org/Preloader_Page_Zero_Problem) , but I think you still only get that page used if you ask for it explicitly, not from `mmap(NULL, ...)` – Peter Cordes Dec 07 '17 at 19:33
  • Is your question basically whether `[eax - 2]` will wrap around to the end of memory? – Barmar Dec 07 '17 at 19:38
  • Yeah I mean if the page is mapped: the question is about what x86 hardware allows, not whether it's practical on any particular OS. I suppose at the hardware level it's not any harder than a regular page splitting load. – BeeOnRope Dec 07 '17 at 19:40
  • @Barmar I think the question assumes it would if the load were 1 or 2 bytes, but is asking whether a load of 4 bytes or more is valid at that address. – Daniel H Dec 07 '17 at 19:40
  • @DanielH - yeah I'm asking about a 4 byte load, half of which is at the very high end of the address space and the other two at bytes 0 and 1 (low end). – BeeOnRope Dec 07 '17 at 19:43
  • @DanielH: yes, if you want to state it that way, given page tables and so on such that `movzx ebx, word [eax-2]` and `movzx ebx, word [eax]` are both aligned and both succeed. – Peter Cordes Dec 07 '17 at 19:44
  • Can someone provide a link to a reference that supports the wraparound? I've been googling, trying to find anything that mentions it. – Barmar Dec 07 '17 at 19:48
  • I tried to test it (in compat mode on Linux: 64-bit kernel so all 4GiB are usable). `gcc -Og` gives me the asm I want (https://godbolt.org/g/QzxyxE, but `-O3` uses `ud2` on dereferencing `-2` as a pointer, so don't optimize). `echo 0 | sudo tee /proc/sys/vm/mmap_min_addr` lets `mmap((void*)0, ..., MAP_FIXED)` succeed. But I ran into trouble mapping the top page: strace says `mmap2(0xfffff000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)`. And the vDSO is mapped at the page below that. @MichaelPetch, any ideas? – Peter Cordes Dec 07 '17 at 20:07
  • @PeterCordes : First off I don't think the question was one of what would be allowed by _C_ but what the processor itself allows for a memory operand on x86. Maybe I have misread this but I felt this question was more about what the processor allows. – Michael Petch Dec 07 '17 at 20:12
  • @MichaelPetch - right, but I think Peter is just using C as a simple way to test this, rather than messing around in assembly. Probably the kernel has some good reason to prevent you mapping this page? It likes to stuff the user-visible kernel stuff up there, like the VDSO, and so it probably has a check like `if (hint_addr > UPPER_LIMIT)` where UPPER_LIMITS carves out some small number of pages for kernel/user shared mappings, and the last page is thus blocked. Maybe the VDSO can be 2 pages in some scenarios, or maybe there is something that goes in that last page sometimes? – BeeOnRope Dec 07 '17 at 20:15
  • @MichaelPetch: Yes, what Bee said. C and `volatile short *` were just handy ways to get the compiler to make the asm I wanted to test, instead of writing it myself. The hard part is getting the mmap system call to succeed to map the memory in the first place. If we can't get that to work, testing would require a custom kernel where we set up the page table ourselves (or leave paging disabled). – Peter Cordes Dec 07 '17 at 20:19
  • So is this a question about what Linux allows or what the processor allows? – Michael Petch Dec 07 '17 at 20:19
  • @MichaelPetch: Now we're just working on a way to experimentally test on real hardware. Preferably (for me) under Linux, but Windows or OS X would be fine, too. – Peter Cordes Dec 07 '17 at 20:19
  • From the get go, I assumed the only viable way to do this was to probably write something like a semi-trivial multiboot compliant test harness and see if in 32-bit protected mode (with and without paging) and 64-bit long mode what happens. Of course with paging enabled you'd have to make sure the first and last logical page are mapped to physical memory with appropriate rights. – Michael Petch Dec 07 '17 at 20:26
  • 2
    I can't find anything about it in the manual, but if I actually try it, it just works. – harold Dec 07 '17 at 20:34
  • @MichaelPetch - what the processor allows. – BeeOnRope Dec 07 '17 at 20:50
  • @harold - thanks! How were you able to set up to try this? I guess that is consistent with the manual not saying anything about it: if it's not explicitly disallowed I guess I suppose it should and would work. – BeeOnRope Dec 07 '17 at 20:51
  • Just booted straight into pmode and manually set up those pages. I ran it in virtualbox in case it matters, though it shouldn't. – harold Dec 07 '17 at 21:01
  • @harold, with virtualbox - on the safe side I'd make sure your not using paravirtualization (disabled) and VT-X/AMD-V enabled. This eliminates paravirtualization from doing something unexpected (ie buggy lol). – Michael Petch Dec 07 '17 at 21:21
  • 1
    @MichaelPetch yes I had it set to that (as usual, because yea.. trouble) – harold Dec 07 '17 at 21:28
  • It appears nobody answered the x86 case. In real mode, a modern CPU will fault and I'm confident it will also do in protected mode. This is due to the limit check. An original 8086, 80186 (and maybe an 80286) won't fault as there was no limit check back then. Analogously, an x86-64 won't fault as proven below. – Margaret Bloom Dec 08 '17 at 08:48
  • Amend: The protected-mode case is implementation specific, as stated in the manual. Due to the the low-12-bits issue with limit checks, I guess. – Margaret Bloom Dec 08 '17 at 08:53
  • @Margaret, I have adapted my code to test the 32 bit case, but I haven't gotten it working yet. I wrote it with the expectation that it would work, though; if you think it may not work, I'll have to add an exception handler. Ugh. – prl Dec 08 '17 at 09:38
  • @prl, Yes, it's a bit laborious :) I once tested the 32-bit case but was 13 years ago, I can't remember the outcome – Margaret Bloom Dec 08 '17 at 12:40
  • 2
    I've tested both real and protected mode and the former faults while the latter doesn't. However, the 32-bit case is implementation specific. I don't think this will ever change in mainstream Intel processors but for other brands or spin-off architectures, it may. – Margaret Bloom Dec 08 '17 at 13:42
  • I believe the 12-bit issue Margaret refers to is that to set a limit beyond 2^20 bytes you need to move to 4kb granularity in the descriptor for the limit (Limit is a 20-bit value). – Michael Petch Dec 08 '17 at 17:43
  • 1
    @MargaretBloom : The 80286 datasheet says this: _All segments in real address mode are 64kbytes in size and may be read, written, or executed. An exception or interrupt can occur if data operands or instructions attempt to wrap around the end of a segment (e.g., a word with its low order byte at offset FFFF(H) and its high order byte at offset 0000(H))_.. It is interesting that it says **can** and not **will** although at the time I'm assuming they may have meant it will. – Michael Petch Dec 08 '17 at 17:58
  • 2
    I did some experimentation. On my processor if I am in 32-bit protected mode and I'm using a regular flat 4gb memory model (where base is 0) and I write a word across the end of memory there is no fault. As a different experiment If I change the base in the descriptor to 1 (instead of the normal zero) keeping the 4gb limit, and I attempt to write to a selector using that descriptor (ie ES) with the offset 0xffffffff it will fault. It seems with a non zero base if added to the effective address of the memory operand it will fault (it doesn't wrap to 0) if the computed address is 2^32 or higher. – Michael Petch Dec 08 '17 at 22:32
  • 2
    In the second experiment in my last comment though (the case where base is 1) and I do a word write with a selector using that descriptor (ie ES) with the offset 0xfffffffe it will not fault and will wrap. So the base+effective address check before a memory access is done can't wrap but after that if the write itself crosses the end of memory it will wrap. – Michael Petch Dec 08 '17 at 22:45
  • As well the other check the processor does is to see if the limit in the descriptor being used has been reached.If In protected mode if I set granularity to byte, set the limit to 0 (allowing access to only one byte) and I set the base to 0xffffffff) a word write will fault (if I change the limit to 1 it works). This is similar to the phenomenon that is documented for real mode. If you attempt to write across the limit (286+) it will fault. – Michael Petch Dec 08 '17 at 23:02
  • 2
    So what happens if we butcher real mode a bit. Enable protected mode and set up a GDT with a 16-bit descriptor with a base of 0xffffffff (use a limit at 0xffff). Set _ES_ to that descriptor then turn off protected mode without reloading ES (processor will still use the cached base address if you don't reload ES with a value). If we write a word to 0x0000 the write succeeds and the memory operation wraps. Real mode seems to behave the same way protected mode does. – Michael Petch Dec 08 '17 at 23:10
  • 2
    Earlier I said the processor would fault if the base plus the effective address was 2^32 or higher. It is more specific than that. It only faults if the memory operand consists solely of an offset value (so no index and scaling) and adding it to the base can't be represented in 32 bits. – Michael Petch Dec 09 '17 at 00:03
  • 1
    @MichaelPetch, Interesting. I set up a descriptor for a segment starting a 2GiB, with a limit of 4GiB. Accessing an offset of 3GiB (0xc0000000) in that segment doesn't fault in my Haswell and at (2 + 3) % 4 = 1 GiB I found the value written. I've always thought of it as the CPU checking the only the offset vs the segment limit and in case of success and addition modulo 2^32 is performed to add the offset to the base. I'll double check my code for mistakes. – Margaret Bloom Dec 09 '17 at 13:56

2 Answers2

5

I just tested with this EFI program. (And it worked, as expected.) If you want to reproduce this result, you would need an implementation of efi_printf, or another way to view the result.

#include <stdint.h>
#include "efi.h"

uint8_t *p = (uint8_t *)0xfffffffffffffffcULL;

int main()
{
    uint64_t cr3;
    asm("mov %%cr3, %0" : "=r"(cr3));
    uint64_t *pml4 = (uint64_t *)(cr3 & ~0xfffULL);

    efi_printf("cr3 %lx\n", cr3);
    efi_printf("pml4[0] %lx\n", pml4[0]);
    uint64_t *pdpt = (uint64_t *)(pml4[0] & ~0xfffULL);
    efi_printf("pdpt[0] %lx\n", pdpt[0]);
    if (!(pdpt[0] & 1)) {
        pdpt[0] = (uint64_t)efi_alloc_pages(EFI_BOOT_SERVICES_DATA, 1) | 0x03;
        efi_printf("pdpt[0] %lx\n", pdpt[0]);
    }
    uint64_t *pd = (uint64_t *)(pdpt[0] & ~0xfffULL);
    efi_printf("pd[0] %lx\n", pd[0]);
    if (!(pd[0] & 1)) {
        pd[0] = (uint64_t)efi_alloc_pages(EFI_BOOT_SERVICES_DATA, 1) | 0x03;
        efi_printf("pd[0] %lx\n", pd[0]);
    }
    if (!(pd[0] & 0x80)) {
        uint64_t *pt = (uint64_t *)(pd[0] & ~0xfffULL);
        efi_printf("pt[0] %lx\n", pt[0]);
        if (!(pt[0] & 1)) {
            pt[0] = (uint64_t)efi_alloc_pages(EFI_BOOT_SERVICES_DATA, 1) | 0x03;
            efi_printf("pt[0] %lx\n", pt[0]);
        }
    }

    efi_printf("[0] = %08x\n", *(uint32_t *)(p+4));

    efi_printf("pml4[0x1ff] %lx\n", pml4[0x1ff]);
    if (pml4[0x1ff] == 0) {

        uint64_t *pt = (uint64_t *)efi_alloc_pages(EFI_BOOT_SERVICES_DATA, 4);
        uint64_t x = (uint64_t)pt;

        efi_printf("pt = %p\n", pt);

        pml4[0x1ff] = x | 0x3;
        pt[0x1ff] = x + 0x1000 | 0x3;
        pt[0x3ff] = x + 0x2000 | 0x3;
        pt[0x5ff] = x + 0x3000 | 0x3;

        *(uint32_t *)p = 0xabcdabcd;
        *(uint32_t *)(p + 4) = 0x12341234;

        efi_printf("[0] = %08x\n", *(uint32_t *)(p+4));
        efi_printf("[fffffffffffc] = %08x\n", *(uint32_t *)(x + 0x3ffc));

        *(uint32_t *)(p + 2) = 0x56785678;

        efi_printf("p[0] = %08x\n", ((uint32_t *)p)[0]);
        efi_printf("p[1] = %08x\n", ((uint32_t *)p)[1]);
    }

    return 0;
}

If it works as expected, the last 4 lines should be:

[0] = 12341234
[fffffffffffc] = ABCDABCD
p[0] = 5678ABCD
p[1] = 12345678

A value of 0x56785678 is written starting in the last 16-bit word of memory and should wrap to the first 16-bit word of memory.


Note: p needed to be a global variable, otherwise GCC changed *(p+4) into ud2

prl
  • 11,716
  • 2
  • 13
  • 31
  • 2
    [I found](https://stackoverflow.com/questions/47702410/is-it-allowed-to-access-memory-that-spans-the-zero-boundary-in-x86#comment82366009_47702410) that compiling with only `-Og` instead of `-O2` or higher got gcc to make the asm I wanted. But sure, a global works too, to stop the optimizer from seeing a compile-time constant. – Peter Cordes Dec 07 '17 at 22:04
3

This is not really a new answer but was too big for a comment. This is @prl's code converted so that it should run with the basic gnu-efipackage available on many Linux distros. File wraptest.c:

#include <efi.h>
#include <efiapi.h>
#include <efilib.h>
#include <inttypes.h>
#include <stdint.h>

uint8_t *p = (uint8_t *)0xfffffffffffffffcULL;

EFI_STATUS
EFIAPI
efi_main (EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable)
{
    uint64_t cr3;

    InitializeLib(ImageHandle, SystemTable);
    asm("mov %%cr3, %0" : "=r"(cr3));
    uint64_t *pml4 = (uint64_t *)(cr3 & ~0xfffULL);

    Print(L"cr3 %lx\n", cr3);
    Print(L"pml4[0] %lx\n", pml4[0]);
    uint64_t *pdpt = (uint64_t *)(pml4[0] & ~0xfffULL);
    Print(L"pdpt[0] %lx\n", pdpt[0]);
    if (!(pdpt[0] & 1)) {
        uefi_call_wrapper(BS->AllocatePages, 4, AllocateAnyPages, \
                          EfiBootServicesData, 1, &pdpt[0]);
        pdpt[0] |= 0x03;
        Print(L"pdpt[0] %lx\n", pdpt[0]);
    }
    uint64_t *pd = (uint64_t *)(pdpt[0] & ~0xfffULL);
    Print(L"pd[0] %lx\n", pd[0]);
    if (!(pd[0] & 1)) {
        uefi_call_wrapper(BS->AllocatePages, 4, AllocateAnyPages, \
                          EfiBootServicesData, 1, &pd[0]);
        pd[0] |= 0x03;
        Print(L"pd[0] %lx\n", pd[0]);
    }
    if (!(pd[0] & 0x80)) {
        uint64_t *pt = (uint64_t *)(pd[0] & ~0xfffULL);
        Print(L"pt[0] %lx\n", pt[0]);
        if (!(pt[0] & 1)) {
            uefi_call_wrapper(BS->AllocatePages, 4, AllocateAnyPages, \
                              EfiBootServicesData, 1, &pt[0]);
            pt[0] |= 0x03;
            Print(L"pt[0] %lx\n", pt[0]);
        }
    }

    Print(L"[0] = %08x\n", *(uint32_t *)(p+4));

    Print(L"pml4[0x1ff] %lx\n", pml4[0x1ff]);
    if (pml4[0x1ff] == 0) {
        uint64_t *pt;
        uefi_call_wrapper(BS->AllocatePages, 4, AllocateAnyPages, \
                          EfiBootServicesData, 4, &pt);
        uint64_t x = (uint64_t)pt;

        Print(L"pt = %lx\n", pt);

        pml4[0x1ff] = x | 0x3;
        pt[0x1ff] = (x + 0x1000) | 0x3;
        pt[0x3ff] = (x + 0x2000) | 0x3;
        pt[0x5ff] = (x + 0x3000) | 0x3;

        *(uint32_t *)p = 0xabcdabcd;
        *(uint32_t *)(p + 4) = 0x12341234;

        Print(L"[0] = %08x\n", *(uint32_t *)(p+4));
        Print(L"[fffffffffffc] = %08x\n", *(uint32_t *)(x + 0x3ffc));

        /* This write should place 0x5678 in the last 16-bit word of memory
         * and 0x5678 at the first 16-bit word in memory. If the wrapping
         * works as expected p[0] should be 0x5678ABCD and
         * p[1] should be 0x12345678 when displayed. */
        *(uint32_t *)(p + 2) = 0x56785678;

        Print(L"p[0] = %08x\n", ((uint32_t *)p)[0]);
        Print(L"p[1] = %08x\n", ((uint32_t *)p)[1]);
    }

    return 0;
}

A Makefile that should work on 64-bit Ubuntu and 64-bit Debian could look like this:

ARCH            ?= $(shell uname -m | sed s,i[3456789]86,ia32,)
ifneq ($(ARCH),x86_64)
LIBDIR          = /usr/lib32
else
LIBDIR          = /usr/lib
endif

OBJS            = wraptest.o
TARGET          = wraptest.efi

EFIINC          = /usr/include/efi
EFIINCS         = -I$(EFIINC) -I$(EFIINC)/$(ARCH) -I$(EFIINC)/protocol
LIB             = $(LIBDIR)
EFILIB          = $(LIBDIR)
EFI_CRT_OBJS    = $(EFILIB)/crt0-efi-$(ARCH).o
EFI_LDS         = $(EFILIB)/elf_$(ARCH)_efi.lds

CFLAGS          = $(EFIINCS) -fno-stack-protector -fpic \
                  -fshort-wchar -mno-red-zone -Wall -O3
ifeq ($(ARCH),x86_64)
  CFLAGS += -DEFI_FUNCTION_WRAPPER
endif

LDFLAGS         = -nostdlib -znocombreloc -T $(EFI_LDS) -shared \
                  -Bsymbolic -L $(EFILIB) -L $(LIB) $(EFI_CRT_OBJS)

all: $(TARGET)

wraptest.so: $(OBJS)
        ld $(LDFLAGS) $(OBJS) -o $@ -lefi -lgnuefi

%.efi: %.so
        objcopy -j .text -j .sdata -j .data -j .dynamic \
                -j .dynsym  -j .rel -j .rela -j .reloc \
                --target=efi-app-$(ARCH) $^ $@

The code as written will only work properly if compiled for x86-64. You can make this EFI application with the command:

make ARCH=x86_64

The resulting file should be wraptest.efi that can be copied to your EFI System Partition. The make file was based on Roderick Smith's tutorial

Michael Petch
  • 46,082
  • 8
  • 107
  • 198