5

I am reading "C in a nutshell" and there are alot of sentences similar to this one:

A statement specifies one or more actions to be performed such as assigning a value to a variable, passing control to a function, or jumping to another statement.

My question is what is the thing that "performs" these actions?

I have read here and there that C was defined to run on an abstract machine, so my guess is that the abstract machine is supposed to perform these actions, and the job of actual compilers like gcc is to ensure that if you evaluate a program mentally based on the way the abstract machine works then you would get the same result as when you actually run the object file generated by the compiler (ofcourse evaluating a program mentally is not possible in most cases, but I am speaking theoretically here).

So is the abstract machine supposed to interpret C code (after preprocessing) directly? Is C supposed to be translated to some intermediate code that the abstract machine interprets? What exactly is the relationship between the abstract machine and C?

What is the state of the abstract machine visible to programs? Only the main memory? If the abstract machine really interprets C code directly, how are declarations evaluated, how do they change the state of the abstract machine? This last series of questions only serves the purpose of giving you an idea of what I mean by precise relationship between C and it's abstract machine.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 4
    The abstract machine doesn't do anything and it doesn't actually exist. That's why it is called an _abstract_ machine. Google _"c abstract machine"_ – Jabberwocky Nov 01 '18 at 11:20
  • 2
    "how are declarations evaluated, how do they change the state of the abstract machine? " - that is what all the rules in the standard specify – M.M Nov 01 '18 at 11:28

2 Answers2

17

The abstract machine does not exist - it is, after all, literally abstract ("existing in thought or as an idea but not having a physical or concrete existence"). The abstract machine is an imaginary machine that precisely follows the rules of the standard.

The C program is compiled by a compiler to a concrete machine which might (and usually does) have semantics distinct from that of the abstract machine. The actual machine might have things like speculative execution, out-of-order execution and parallelism.

A compliant compiler must produce an executable that when run, will have the observable behaviour as if the program was executed in the said abstract machine following the rules of the standard.

6

The abstract machine is a formal C term for the model of program execution. It is related to the abstract model called Turing machine and refers to the very core of the language. The abstract machine is defined by the whole chapter C17 5.1.2.3 Program execution, where the first line says:

The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

In other words, the abstract machine is a model for the specified outcome of a program, regardless of optimizations. It specifies the term sequencing of expressions (order of execution), the rules for determining if an optimization is allowed or not and the observable behavior of a program.

Very simply put, the abstract machine is what specifies that source code lines are to be read as if executed from the top to bottom of the source file.

Take this example:

int a = 1;
int b = 1;
int c = a + b + 1;
printf("%d", c);

The abstract machine is what specifies that the initializations of a and b are performed first, then the line int c = a + b + 1; and finally the printf. The result must be 3. This means that the compiler is not allowed to re-order these lines if it affects the outcome of the program. There are sequence points at the ; of each line, where all previous calculations must be finished.

The compiler is however free to execute the sub-expression a + b first, or b + 1 first, as they are not sequenced in relation to each other. The order of evaluation is not specified. Similarly, it could initialize b before a since the order wouldn't matter.

The compiler is also free to replace the code with c = 1 + 1 + 1; or with c = 3; or just replace it all with printf("3");. Neither would affect the observable behavior of the program, so it would be valid optimizations to make.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • The sequence points don't specify if the compiler can reorder the statements. It's the observable behaviour that must be preserved when optimising. In fact, the first two lines can be reordered and the observable behaviour will not change so a compiler is allowed to reorder. If it was like you're saying, i.e. not possible to reorder between sequence points, i.e. at ";", it would have been impossible for compilers to do any meaninngufl optimisation. – ad3angel1s Aug 06 '19 at 22:50
  • @ad3angel1s Where did I say any of that? "This means that the compiler is not allowed to re-order these lines if it affects the outcome of the program" which in turn means it _is_ allowed to re-order if it doesn't. – Lundin Aug 07 '19 at 06:32
  • here: "This means that the compiler is not allowed to re-order these lines if it affects the outcome of the program. There are sequence points at the ; of each line, where all previous calculations must be finished." The part on the sequence points is not clear, at the best. – ad3angel1s Aug 07 '19 at 07:57
  • That means that if you first have the row `a = a + b;` and then later `c = a + b;`, the compiler is not allowed to place the `c = a + b;` calculation above the first one, because it would change the meaning of the code. This is the very purpose of sequence points. – Lundin Aug 07 '19 at 08:08