Why do these two pointers that should be the same point to different data?

Question

I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:

struct directory_entry {
  uint8_t name[11];
  uint8_t attrib;
  uint8_t name_case;
  uint8_t created_decimal;
  uint16_t created_time;
  uint16_t created_date;
  uint16_t accessed_date;
  uint16_t ignore;
  uint16_t modified_time;
  uint16_t modified_date;
  uint16_t first_cluster;
  uint32_t length;
} __attribute__ ((packed));

I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.

In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.

What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.

Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:

(void *)e.name - *(void *)&e
  => 0
*(uint8_t *)&e
  => 80
e.name[0]
  => 0
(void *)&e.attrib - (void *)&e
  => 11
*((uint8_t *)&e + 11)
  => 8
e.attrib
  => 0

I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.

Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused. u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)

lea   eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax]   ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al           ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push  0x31ce0 ;nbuf
push  eax     ;the byte we loaded
call  0x3162f ;u8_dec
add esp, 0x10

This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:

void u8_dec(uint8_t n, uint8_t *b) {
  if (!n) {
    *(uint16_t *)b = '0';
    return;
  }
  bool zero = false;
  for (uint32_t m = 100; m; m /= 10) {
    uint8_t d = (n / m) % 10;
    if (zero)
      *(b++) = d + '0';
    else if (d) {
      zero = true;
      *(b++) = d + '0';
    }
  }
  *b = 0;
}

It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.

Are you sure the contents you're getting from the disk are what you expect? — Joseph Sible-Reinstate Monica, May 24 '20 at 21:24
*`(void *)&e.attrib - (void *)&e`* is not proper C code. You can't do pointer arithmetic using `void *` pointers. — Andrew Henle, May 24 '20 at 21:29
@AndrewHenle: You can with gcc, and the question is already tagged [tag:gcc] and using other gcc extensions... — Nate Eldredge, May 24 '20 at 21:32
Your understanding of `__attribute__((packed))` is correct, so your bug is likely something else. — Nate Eldredge, May 24 '20 at 21:35
@AndrewHenle: See, this is why I keep telling people that the C standard forbids very little. You say “can”t” but in fact here is somebody doing it, proving that you can. *Undefined* behavior is not **forbidden** behavior. — Eric Postpischil, May 24 '20 at 21:57
I don't see anything obviously wrong in the code either. I think it's time to start single-stepping it with a debugger and seeing what actually happens. I tried briefly to get your code running in an emulator but didn't immediately succeed. — Nate Eldredge, May 24 '20 at 22:42
"not sure why this is here, as it's already zero-extended": just because you're compiling without optimization and gcc very often generates redundant code like this when not optimizing. — Nate Eldredge, May 24 '20 at 22:45
Haven't really followed the code, but the observed behavior could be consistent with a buffer overflow in `vga_printsz` or `u8_dec`. — dxiv, May 24 '20 at 22:57
`*(uint16_t *)b = '0';` is a strict aliasing violation (as well as being fishy) — M.M, May 24 '20 at 22:59
Not that it proves anything, but I converted the relevant part of your code as best I could into a regular userspace program, and it ran correctly. — Nate Eldredge, May 25 '20 at 01:18
Now that you have an answer, please consider writing it as **an answer**. StackOverflow is not a forum where you mark solved questions by editing the title. You write an answer and mark it. — the busybee, May 25 '20 at 06:01

score 4 · Accepted Answer · answered May 25 '20 at 13:25

4

My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.

However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.

I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.

answered May 25 '20 at 13:25

Benji Dial

131
7

This was interesting. So when you did `struct directory_entry e; fs_read(root, 32, &e);` the data was not read into `e` (on the stack) but into some other area of memory, since `fs_read` used the pointer relative to ds. When you did `u8_dec(*(uint8_t *)&e, nbuf)`, the compiler generated `lea eax, [ebp - 0x30] ; movzx eax, byte [eax]`. The `[eax]` is relative to `ds`, so this effectively made the same translation and fetched from the "wrong" area, and thus you saw the value you expected. [...] – Nate Eldredge May 25 '20 at 13:42
On the other hand, `u8_dec(e.name[0], nbuf);` compiled into `movzx eax, byte [ebp - 0x30]`. This is ss-relative, so it actually did fetch from the "correct" address on the stack, but of course the expected data wasn't there. – Nate Eldredge May 25 '20 at 13:43
Indeed, the only way for the compiler *not* to assume that ss and ds have the same base would be to use "far" 48-bit seg:ofs pointers for everything, like the 8086 "large" model. This would be very inefficient, since you'd have to reload one of the segment register for every access, which triggers a whole bunch of internal checks. I don't know if there ever existed any 32-bit x86 compiler that could do this. Certainly gcc never did. – Nate Eldredge May 25 '20 at 13:52

Why do these two pointers that should be the same point to different data?

1 Answers1