0

Consider a custom data structure that represents a 3 byte unsigned integer. In little-endian assembly, one can simply load 4 bytes into a register via a dword pointer and, for instance, perform addition on 3 byte integers, store the lower word at memory address 'x', shift the result right by 16 and store the lower byte at memory address x + 2. The slower, non undefined behavior version of the load would be to first zero out register 'a', load the lower word into register 'a', load the third byte into register 'b', shift 'b' left by 16 and OR register 'a' and 'b' into another register.

So now let's assume that a given 3 byte integer is "right at the edge" of the program space or data segment; intuitively, you'd access memory illegally by dereferencing the integer's address as a dword, wouldn't you? Or more generally: when does a memory access violation occur when reading memory; is it only the base address that's being considered or is it (baseAddress + lengthInBytes)? Having searched for an answer for weeks, I haven't come across an answer yet, which is why I'm asking the community...

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Easy enough to test, and yes it does indeed fault in that case. That's why SIMD strlen has to be careful to check that it's far enough from the end of a page before doing a 16-byte unaligned load, or only do aligned loads. You can't sneak your way into reading a byte from the start of a supervisor-only page, or get the HW to fill in zeros or anything from bytes from an unmapped page (no valid translation available at all) that are part of a multi-byte load. – Peter Cordes Mar 21 '21 at 23:15
  • Answers you're finding that only mention the starting address would make sense for *aligned* loads, because a naturally-aligned access can't cross any wider boundary. So in that case, it's the same thing as requiring all the bytes to come from valid pages. (And if you were using segmentation, I think from within the segment limit.) – Peter Cordes Mar 21 '21 at 23:17
  • And BTW, I wrote a big answer on [How to MOVe 3 bytes (24bits) from memory to a register?](https://stackoverflow.com/q/47832367) a while ago. – Peter Cordes Mar 21 '21 at 23:27
  • Hey - thanks! The _actual_ reason I asked this question is related to SIMD... Namely the sum of an array of unsigned chars/bytes. Compilers I tried only generate SIMD code that zero extends them to 32 bit integer vectors. Converting them to shorts adds twice as many per iteration but for optimal performance, you'd use a base pointer + multiples of 8 as an xmmword ptr, which can overrun. I just came up with a simpler example without thinking about loading 16 bits into AX. I tried testing it in C just now with 2 global 3-byte variables, not causing a segfault. I'd like to know how to test it! – MrUnbelievable92 Mar 21 '21 at 23:44
  • 2
    To test it, you'd `mmap` a page, then do a *4* byte load from 3 bytes before the end of the page, like `*(volatile int32_t*)(page+4093)`. If you tell the compiler you have a 3-byte bitfield object, it's always going to load it safely. – Peter Cordes Mar 21 '21 at 23:50
  • Or in hand-written NASM with `section .bss` / `align 4096` / `buf: resb 4096` and write a load like `mov eax, [buf + 4093]` / `mov eax, [buf+4096]` and see if the 2nd load is reached. (If it doesn't fault either, there's another BSS page after this, maybe from libc or CRT code if you linked with them instead of a simple static executable; in that case maybe reduce to `buf: resb 100` so the other BSS stuff can fit into the same page.) – Peter Cordes Mar 21 '21 at 23:54
  • re: vectorization: silly compilers. The good way to sum unsigned bytes (without overflow) is to `psadbw` against `0` to widen to two 64-bit elements, then `paddq`, or just `paddd` is faster on some CPUs, and usable if you don't need 64-bit sums. – Peter Cordes Mar 21 '21 at 23:57

0 Answers0