How does the compiler handle case statements?

Question

In the documentation under case statements, it says:

Each value represented by a caseList must be unique in the case statement;

And the example shown,

case I of
  1..5: Caption := 'Low';
  6..9: Caption := 'High';
  0, 10..99: Caption := 'Out of range';
else
  Caption := ;
end

is equivalent to the nested conditional:

if I in [1..5] then
  Caption := 'Low';
else if I in [6..10] then
  Caption := 'High';
else if (I = 0) or (I in [10..99]) then
  Caption := 'Out of range'
else
  Caption := ;

So the first quote suggests it is handled like a set (read the comments here at least one person is with me on this).

Now I know the part

where selectorExpression is any expression of an ordinal type smaller than 32 bits

contradicts with the properties of sets as it is mentioned her under sets:

The base type can have no more than 256 possible values, and their ordinalities must fall between 0 and 255

What is really bugging me is that why it is a necessity to have a unique values in the caseList. If it is equivalent to the if statement, the second value would be just not tested because the compiler already found a prior match?

There's no contradiction. A valid case statement can always be written as an equivalent if statement. But if statements are more general. If the selector values are too big for a set then they can be written using comparison operators instead. Don't get hung up on that if statement in the documentation. It is just there to illustrate the point. — David Heffernan, Mar 19 '17 at 09:01
I have tried a test example, when in debugger mod (when using f7) in the case statement it jumps directly to the matching value but in the if statement when using F7 it is going instruction by instruction, I do not have much knowledge about assembly but there is a lot of things happening before the first case is handled. — Nasreddine Galfout, Mar 19 '17 at 09:11
The compiler will generate different code for a case statement. But the behaviour is equivalent to that if. — David Heffernan, Mar 19 '17 at 09:13
See the reference to jump tables here:https://books.google.co.uk/books?id=8J-MqM89t0IC&pg=PT472&lpg=PT472&dq=delphi+jump+table&source=bl&ots=vYb-0Ja5sO&sig=xKWiqiow1UnBDaql7MwQUJwEEss&hl=en&sa=X&ved=0ahUKEwjnnITzn-LSAhWLI8AKHZwtB7IQ6AEIQDAG#v=onepage&q=delphi%20jump%20table&f=false — MartynA, Mar 19 '17 at 09:27
@DavidHeffernan that is the reason why I'm asking. from Johan's answer I'm getting (not sure) that the jump table is the things happening before the first case is handled — Nasreddine Galfout, Mar 19 '17 at 14:39
There might be a jump table. There might not. Why do you care? — David Heffernan, Mar 19 '17 at 14:41
@DavidHeffernan it is not about if I care or not, I want to know what is happening so I could chose better between the tools I have — Nasreddine Galfout, Mar 19 '17 at 14:47
That's what I'm asking. Do you want to know semantically what a case statement is? Or is performance the issue. It's not at all clear what is driving the question. You seemed to be asking whether or not the case and if statements in the question were equivalent. Which would imply semantics mattered. But now I wonder whether your issue is different. — David Heffernan, Mar 19 '17 at 14:49
@Nasreddine If you're wondering how you'd check uniqueness on a collection of items, there are many ways. The best choice depends on the properties of the data you're trying to check. I've added a reasonable _guess_ as to how Delphi's compiler might perform the check for the kind of data it's likely to process. But if you have a problem in that area, please post a **new question** showing what you've tried and explaining what you're struggling with. (NB: Remember the nature of the data you're trying to check is very important.) — Disillusioned, Mar 19 '17 at 17:13
@DavidHeffernan the first time I read the documentation, I really thought the compiler would simply rewrite the case statement as a set of if statements, and the reason we have a case statement in the first place is just to save the programmer typing time and make the code more readable. after I saw the behavior in the debugger, I knew I'm messing something very interesting.(this is what is driving the question) — Nasreddine Galfout, Mar 20 '17 at 06:15
@CraigYoung I was thinking of asking such question, however that would be in some other time after I do all the research necessary.. know it is all about that case statement — Nasreddine Galfout, Mar 20 '17 at 06:19
The compiler could do that. It could also rewrite the if statement as a case statement. Perhaps it does for some versions of the compiler. — David Heffernan, Mar 20 '17 at 07:01

score 10 · Accepted Answer · edited May 23 '17 at 12:25

The documentation takes a specific case statement which is equivalent to a specific if statement.

In general, any case statement can be rewritten as an if statement using the same approach. However, the inverse is not true.

The documentation uses an equivalent if statement to explain the logical behaviour (or semantics) of the case statement. It's not a representation of what the compiler does internally.

How does the compiler handle case statement?

First note there are 2 aspects to this question.

Semantically the compiler must handle the case statement as indicated in the documentation. This includes:
- Ensuring the values of each caseList entry can be evaluated at compile-time.
- Ensuring the caseList entries are unique.
- Whichever caseList entry matches, the corresponding caseList-statement is called.
- If no caseList entries match, the else statements are called.
However, the compiler is entitled to optimise the implementation as it sees fit provided the optimised byte/machine code is logically equivalent.
- Johan's answer describes common optimisations: jump-lists and reordering.
- These are much easier to apply given the strict semantics.

What is really bugging me is that why it is a necessity to have a unique values in the caseList.

The uniqueness is required to remove ambiguity. Which caseList-statement should be used if more than one matches?

It could call the first matching caseList-statement and ignore the rest. (SQL Server CASE statement behaves like this.) Also see [1] below.
It could call all of them. (If I remember correctly the MANTIS programming language uses this semantic for its version of a case statement.)
It could report an error requiring the programmer to disambiguate the caseList. (Simply put, this is what Delphi's specification requires. Many other languages use the same approach. Quibbling about it is unproductive especially since this choice is unlikely to be much of a hindrance.)

If it is equivalent to the if statement the second value would be just not tested because the compiler already found a prior match.

[1] I'd like to point out that this can make code much more difficult to read. This behaviour is "ok" when using magic literals, but when using const identifiers it becomes dangerous. If 2 different consts have the same value, it won't be immediately apparent that the latter caseList-statement where the caseList also matches won't be called. Also case statements would be subject to behaviour change due to a simple caseList re-ordering.

const
  NEW_CUSTOMER = 0;
  EDIT_CUSTOMER = 1;
  ...
  CANCEL_OPERATION = 0;

case UserAction of
  NEW_CUSTOMER : ...;
  EDIT_CUSTOMER : ...;
  ...
  CANCEL_OPERATION : ...; { Compiler error is very helpful. }
end;

Contradicts with the properties of sets

There's no contradiction. The fact that each caseList value must be unique does not in any way imply it must be "handled like a set". That's your incorrect assumption. Other's making the same assumption are likewise incorrect.

How the uniqueness constraint is checked, is up to the compiler. We can only speculate. But I'd guess the most efficient would be to maintain an ordered list of ranges. Work through each of the caseList of values and ranges, find its position in the above list. If it overlaps, then report an error, otherwise add it to the list.

@David I've rolled back your edit. I did not intend a Horizontal Rule. I was trying to simulate a footnote reference (from the bullet "Also see ** below."). I haven't been able to find mark-down support for that. — Disillusioned, Mar 19 '17 at 16:33

Johan · Answer 2 · 2017-03-19T10:33:49.730

6

case statement
The compiler prefers to transform the case statement to a jump table.
In order to make this possible duplicate case labels are not allowed.

Furthermore the compiler does not have to test the case statements in the order you've declared them. For optimization reasons it will reorder these elements as it sees fit; so even if it does not use a jump table, it still cannot allow duplicate case labels.

It is for these same reason (and to keep programmers mentally sane) that the case statement does not allow fall-throughs.

if statement
A (series of) if statements are processed in the order they are declared.
The compiler will test them one by one (and jump out when appropriate).
If the if-statement does not contain duplicates the code will perform the same as an equivalent case statement, but the generated code will most likely be different.
If statements are never converted into jump tables.

about sets
The reason sets are limited to 256 elements is that every element in a set takes up one bit of space.
256 bits = 32 bytes.
Allowing more than 256 elements in a set would grow the representation in memory too much, hampering performance.

The case statement does not use sets, it merely uses set logic in its semantics.

edited Mar 19 '17 at 10:33

answered Mar 19 '17 at 10:27

Johan

74,508
24
191
319

Strictly speaking though, the jump table optimisation could still be used by loading the first matching target into the table. But in general I do agree that the uniqueness constraint does make optimisations easier. – Disillusioned Mar 19 '17 at 13:50
I'm really interested in your answer, can you please give more details about the part "about sets" especially "256 bit = 32 bytes" do you mean that when using "if value in Aset then ...." the compiler is testing one bit not searching for a match in the hall set. – Nasreddine Galfout Mar 19 '17 at 14:28
The compiler is free to do it how it pleases. It could use two comparison operators to test `in [1..5]` for instance, which would likely be the optimal approach. What is driving the question? It seems to be sprawling now. It started on case statements and now you want to know about sets. – David Heffernan Mar 19 '17 at 15:01
@DavidHeffernan it is still about case statement, I'm just curios about that part in the answer, and you are right I think that would need me to ask a new question. – Nasreddine Galfout Mar 20 '17 at 06:26

How does the compiler handle case statements?

2 Answers2