0

I'm writing an assembler for a language I'm making up as I go along, and I'm parsing labels.

Labels start with an octothorpe, and end with a whitespace character, so as I parse, if I encounter an #, I call my make_label function.

make_label looks something like:

uint32_t make_label(FILE f) {

  uint8_t i=0;
  char c;
  char buffer[64];
  while ( (c = fgetc(f)) != ' ') {
    buffer[i++] = c;
  }
  
  // Do thing with label
  return 1
}

There's a bit more to it, but that's the general gist. There's a bug as written, which uncovered some weird behaviour I don't quite understand.

I forgot to '\0' terminate the buffer. When I examined the labels, given an input like:

#start
...
#loop
...
#ignore
...
#end
...

I would see the labels:

start
loopt
ignore
endore

The buffer variable was keeping its value between calls.

I'm not sure it really matters, because I realise I should have been adding the null terminator, but I was curious as to why this is happening? I was using printf to output the buffer, and it didn't seem to care that there was no terminator, which is why I didn't notice immediately.

Is this all just dumb luck? The array as declared just happened to be zeroed, and each call just happened to allocate the same block on the stack?

MalphasWats
  • 3,255
  • 6
  • 34
  • 40
  • 2
    Undefined behavior (reading uninitialized values), so you shouldn't spend too much time worrying about it. What's probably happening is that your calls to `make_label` are at the same depth in the stack, nothing is overwriting this shared stack, and the data is still there. It's easier to leave on the stack than zero out. This behavior could change with the next compiler version, or if you use different flags, or run on a different system. If there's a security or privacy concern, then you'd want to sanitize the stack before returning. – JohnFilleau Feb 05 '23 at 15:47
  • 1
    It doesn't "persist". It just didn't get stomped on. – Andrew Henle Feb 05 '23 at 15:47
  • Stack variables pick up whatever garbage happens to be on the stack when they're allocated. If you're calling the same function, and nothing else overwrote the stack in between, then it *may* contain some or all of the data from the previous call. But you can't rely on that. In any case, it's not particularly interesting, and certainly not mysterious. Garbage is garbage. – Tom Karzes Feb 05 '23 at 15:49
  • (a) That is lifetime, not scope. An object defined inside a block (the body of a function definition is one block, and there are others) without a storage-class specifier has automatic storage duration. (b) The identifier `buffer` has block scope, not function scope. (Only `goto` labels have function scope.) (c) Lifetime is the period of time that memory is **reserved** for an object. When your function exits, the memory for `buffer` is no longer reserved. When your function is called again in the same circumstances, the program reuses memory it used before, simply because that is simple… – Eric Postpischil Feb 05 '23 at 16:16
  • 1
    … So the data is still there. This is simply memory remembering things, which is what memory does. That explains the observed behavior. (You cannot rely on that behavior if any aspect of the circumstances changes.) – Eric Postpischil Feb 05 '23 at 16:16
  • 1
    Obligatory link to a [classic SO analogy about a previously-rented hotel room](https://stackoverflow.com/questions/6441218#6445794). – Steve Summit Feb 09 '23 at 14:45
  • @SteveSummit Reading that makes everything seem stupidly obvious. Thank you. – MalphasWats Feb 20 '23 at 15:11

1 Answers1

1

Is this all just dumb luck? The array as declared just happened to be zeroed, and each call just happened to allocate the same block on the stack?

Yep, seems like it!

To address both parts of the question:

The array as declared just happened to be zeroed

This is not so surprising. According to my vague memories of contemporary operating system design, backed up by these other stack overflow answers in Kernel zeroes memory?, all memory in a page will be zeroed to begin with, for security reasons. So if you haven't touched that part of the stack before, it will probably be 0. (Do not rely on this.)

and each call just happened to allocate the same block on the stack

This is not so surprising. Each call to this function allocates the same size block on the stack every time. Furthermore, in your example it seems like every time you call the function you aren't in the middle of parsing anything else, which implies you aren't in the middle of calling any other functions, so there's nothing else on the stack that would add an offset to this call, and thus the block is always allocated in the same place.

That's just my intuition about what's happening; you can experiment to see if it matches reality.