4

I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:

struct directory_entry {
  uint8_t name[11];
  uint8_t attrib;
  uint8_t name_case;
  uint8_t created_decimal;
  uint16_t created_time;
  uint16_t created_date;
  uint16_t accessed_date;
  uint16_t ignore;
  uint16_t modified_time;
  uint16_t modified_date;
  uint16_t first_cluster;
  uint32_t length;
} __attribute__ ((packed));

I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.

In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.

What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.

Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:

(void *)e.name - *(void *)&e
  => 0
*(uint8_t *)&e
  => 80
e.name[0]
  => 0
(void *)&e.attrib - (void *)&e
  => 11
*((uint8_t *)&e + 11)
  => 8
e.attrib
  => 0

I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.

Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused. u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)

lea   eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax]   ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al           ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push  0x31ce0 ;nbuf
push  eax     ;the byte we loaded
call  0x3162f ;u8_dec
add esp, 0x10

This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:

void u8_dec(uint8_t n, uint8_t *b) {
  if (!n) {
    *(uint16_t *)b = '0';
    return;
  }
  bool zero = false;
  for (uint32_t m = 100; m; m /= 10) {
    uint8_t d = (n / m) % 10;
    if (zero)
      *(b++) = d + '0';
    else if (d) {
      zero = true;
      *(b++) = d + '0';
    }
  }
  *b = 0;
}

It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.

Benji Dial
  • 131
  • 7
  • 4
    Please show the code. Please create an [MCVE] – KamilCuk May 24 '20 at 21:24
  • Are you sure the contents you're getting from the disk are what you expect? – Joseph Sible-Reinstate Monica May 24 '20 at 21:24
  • 1
    *`(void *)&e.attrib - (void *)&e`* is not proper C code. You can't do pointer arithmetic using `void *` pointers. – Andrew Henle May 24 '20 at 21:29
  • 1
    @AndrewHenle: You can with gcc, and the question is already tagged [tag:gcc] and using other gcc extensions... – Nate Eldredge May 24 '20 at 21:32
  • 3
    Your understanding of `__attribute__((packed))` is correct, so your bug is likely something else. – Nate Eldredge May 24 '20 at 21:35
  • @AndrewHenle: See, this is why I keep telling people that the C standard forbids very little. You say “can”t” but in fact here is somebody doing it, proving that you can. *Undefined* behavior is not **forbidden** behavior. – Eric Postpischil May 24 '20 at 21:57
  • I don't see anything obviously wrong in the code either. I think it's time to start single-stepping it with a debugger and seeing what actually happens. I tried briefly to get your code running in an emulator but didn't immediately succeed. – Nate Eldredge May 24 '20 at 22:42
  • "not sure why this is here, as it's already zero-extended": just because you're compiling without optimization and gcc very often generates redundant code like this when not optimizing. – Nate Eldredge May 24 '20 at 22:45
  • Haven't really followed the code, but the observed behavior could be consistent with a buffer overflow in `vga_printsz` or `u8_dec`. – dxiv May 24 '20 at 22:57
  • `*(uint16_t *)b = '0';` is a strict aliasing violation (as well as being fishy) – M.M May 24 '20 at 22:59
  • Not that it proves anything, but I converted the relevant part of your code as best I could into a regular userspace program, and it ran correctly. – Nate Eldredge May 25 '20 at 01:18
  • 3
    Now that you have an answer, please consider writing it as **an answer**. StackOverflow is not a forum where you mark solved questions by editing the title. You write an answer and mark it. – the busybee May 25 '20 at 06:01

1 Answers1

4

My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.

However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.

I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.

Benji Dial
  • 131
  • 7
  • This was interesting. So when you did `struct directory_entry e; fs_read(root, 32, &e);` the data was not read into `e` (on the stack) but into some other area of memory, since `fs_read` used the pointer relative to ds. When you did `u8_dec(*(uint8_t *)&e, nbuf)`, the compiler generated `lea eax, [ebp - 0x30] ; movzx eax, byte [eax]`. The `[eax]` is relative to `ds`, so this effectively made the same translation and fetched from the "wrong" area, and thus you saw the value you expected. [...] – Nate Eldredge May 25 '20 at 13:42
  • On the other hand, `u8_dec(e.name[0], nbuf);` compiled into `movzx eax, byte [ebp - 0x30]`. This is ss-relative, so it actually did fetch from the "correct" address on the stack, but of course the expected data wasn't there. – Nate Eldredge May 25 '20 at 13:43
  • Indeed, the only way for the compiler *not* to assume that ss and ds have the same base would be to use "far" 48-bit seg:ofs pointers for everything, like the 8086 "large" model. This would be very inefficient, since you'd have to reload one of the segment register for every access, which triggers a whole bunch of internal checks. I don't know if there ever existed any 32-bit x86 compiler that could do this. Certainly gcc never did. – Nate Eldredge May 25 '20 at 13:52