4

Recently, I wrote the following, buggy, c code:

#include <stdio.h>

struct IpAddr {
  unsigned char a, b, c, d;
};

struct IpAddr ipv4_from_str (const char * str) {
  struct IpAddr res = {0};
  sscanf(str, "%d.%d.%d.%d", &res.a, &res.b, &res.c, &res.d);
  return res;
}

int main (int argc, char ** argv) {
  struct IpAddr ip = ipv4_from_str("192.168.1.1");
  printf("Read in ip: %d.%d.%d.%d\n", ip.a, ip.b, ip.c, ip.d);
  return 0;
}

The bug is that I use %d in sscanf, while supplying pointers to the 1-byte-wide unsigned char. %d accepts a 4-byte wide int pointer, and the difference results in an out-of-bounds write. out-of-bound write is definitely ub, and the program will crash.

My confusion is in the non-constant nature of the bug. Over 1,000 runs, the program segfaults before the print statement 50% of the time, and segfaults after the statement the other 50% of the time. I don't understand why this can change. What is the difference between two invocations of the program? I was under the impression that the memory layout of the stack is consistent, and small test programs I've written seem to confirm this. Is that not the case?

I'm using gcc v11.3.0 on Debian bookworm, kernel 5.14.16-1, and I compiled without any flags set.

Here is the assembly output from my compiler for reference.

Carson
  • 2,700
  • 11
  • 24
  • 1
    Address Space Layout Randomization: https://www.techtarget.com/searchsecurity/definition/address-space-layout-randomization-ASLR – Barmar May 16 '22 at 20:33
  • 1
    @Barmar that's it. Disabling ASLR causes regular failures. I guess I misunderstood what ASLR actually does. If you post an answer, I will mark it as correct, otherwise I will write up an explanation and credit you. – Carson May 16 '22 at 20:44
  • 1
    Your function `ipv4_from_str` overwrites the low 3 bytes of the saved frame pointer `rbp`, so upon its return, `rbp` is corrupted. If it points to an invalid address, you crash on the very next instruction `movl %eax, -4(%rbp)`. If it points to a valid address, you continue on with garbage values, but `leave` at the end of `main` moves `rbp` to `rsp`. Then the return address is popped from some random location in memory, and most likely points somewhere that isn't executable code (zero in my test run), so you crash at that point. – Nate Eldredge May 16 '22 at 20:55

1 Answers1

4

Undefined behavior means that anything can happen, even inconsistent results.

In practice, this inconsistency is most likely due to Address Space Layout Randomization. Depending on how the data is located in memory, the out-of-bounds accesses may or may not access unallocated memory or overwrite a critical pointer.

See also Why don't I get a segmentation fault when I write beyond the end of an array?

Barmar
  • 741,623
  • 53
  • 500
  • 612