22

I recently learned the spiral rule for deobfuscating complex declarations, that should have been written with a series of typedefs. However, the following comment alarms me:

A frequently cited simplification, which only works for a few simple cases.

I do not find void (*signal(int, void (*fp)(int)))(int); a "simple case". Which is all the more alarming, by the way.

So, my question is, in which situations will I be correct to apply the rule, and in which it would be in error?

Vorac
  • 8,726
  • 11
  • 58
  • 101
  • 3
    I suggest you ask `Vorac` when it goes wrong; I'm not aware of a circumstance under which it goes wrong. – Jonathan Leffler Apr 28 '13 at 06:56
  • @Jonathan Leffler , so there are no known problems and the spiral rule is never in error? – Vorac Apr 28 '13 at 07:26
  • 1
    OK — so it is [James Kanze](http://stackoverflow.com/users/649665/james-kanze) who has [views](http://stackoverflow.com/questions/7526152/easy-rule-to-read-complicated-const-declarations/7526298#comment23257858_7526298) on the article posted in 1994 (not you; apologies for misreading attributions). I don't know what James's objection to it is; I don't know of a circumstance under which it fails (but I've not studied it hard). That is not the same as saying there are no such circumstances, but I'd regard an unexplained, uncorroborated comment like that as so much smoke. – Jonathan Leffler Apr 28 '13 at 07:36
  • 1
    @JonathanLeffler It goes wrong even in some simple declarations, like `int* a[][10];`. Or else, you redefine "spiral", and end up with something even worse than the actual definition. – James Kanze Apr 28 '13 at 17:09
  • 3
    @JamesKanze In the said case `int* a[][10]` the [right-left rule](http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html) works. Do you know of cases where this rule is futile too like the spiral rule? Bascially I'm trying to find one rule to rule them all :) – legends2k Oct 09 '13 at 13:46
  • [Right-left rule](https://www.codeproject.com/articles/7042/%2fArticles%2f7042%2fHow-to-interpret-complex-C-C-declarations#right_left_rule), above link is unreachable. – legends2k Nov 21 '18 at 10:34

4 Answers4

23

Basically speaking, the rule simply doesn't work, or else it works by redefining what is meant by spiral (in which case, there's no point in it. Consider, for example:

int* a[10][15];

The spiral rule would give a is an array[10] of pointer to array[15] of int, which is wrong. It the case you cite, it doesn't work either; in fact, in the case of signal, it's not even clear where you should start the spiral.

In general, it's easier to find examples of where the rule fails than examples where it works.

I'm often tempted to say that parsing a C++ declaration is simple, but nobody who has tried with complicated declarations would believe me. On the other hand, it's not as hard as it is sometimes made out to be. The secret is to think of the declaration exactly as you would an expression, but with a lot less operators, and a very simple precedence rule: all operators to the right have precedence over all operators to the left. In the absence of parentheses, this means process everything to the right first, then everything to the left, and process parentheses exactly as you would in any other expression. The actual difficulty is not the syntax per se, but that it results is some very complex and counterintuitive declarations, in particular where function return values and pointers to functions are involved: the first right, then left rule means that operators at a particular level are often widely separated, e.g.:

int (*f( /* lots of parameters */ ))[10];

The final term in the expansion here is int[10], but putting the [10] after the complete function specification is (at least to me) very unnatural, and I have to stop and work it out each time. (It's probably this tendency for logically adjacent parts to spread out that lead to the spiral rule. The problem is, of course, that in the absence of parentheses, they don't always spread out—anytime you see [i][j], the rule is go right, then go right again, rather than spiral.)

And since we're now thinking of declarations in terms of expressions: what do you do when an expression becomes too complicated to read? You introduce intermediate variables in order to make it easier to read. In the case of declarations, the "intermediate variables" are typedef. In particular, I would argue that any time part of the return type ends up after the function arguments (and a lot of other times as well), you should use a typedef to make the declaration simpler. (This is a "do as I say, not as I do" rule, however. I'm afraid that I'll occasionally use some very complex declarations.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • "In the absence of parentheses, this means process everything to the left first, then everything to the right". Isn't it supposed to be the other way around? As you said it yourself, all operators to the right have precedence over all operators to the left. So it should be: first - to the right, then - to the left. – AnT stands with Russia Feb 28 '15 at 06:29
  • @AnT Yes. I definitely mistyped that statement. I'll correct it, thanks. (And it is surprising that no one noticed the error until now.) – James Kanze Feb 28 '15 at 21:11
  • If you squint, this is a spiral: "process everything to the right first, then everything to the left, and process parentheses exactly as you would in any other expression" – creanion Apr 12 '23 at 08:40
9

The spiral rule is actually an over-complicated way of looking at it. The actual rule is much simpler:

postfix is higher precedence than prefix.

That's it. That's all you need to remember. The 'complex' cases are when you have parenthesis to override that postfix-higher-than-prefix precedence, but you really just need to find the matching parenthesis, then look at the things inside the parens first, and, if that is not complete, pull in the next level outside the parenthses, postfix first.

So looking at your complex example

void (*signal(int, void (*fp)(int)))(int);

we can start at any name and figure out what that name is. If you start at int, you're done -- int is a type and you can understand it by itself.

If you start at fp, fp is not a type, its a name being declared as something. So look at the first set of parens enclosing:

                        (*fp)

there's no suffix (deal with postfix first), then the prefix * means pointer. Pointer to what? not complete yet so look out another level

                   void (*fp)(int)

The suffix first is "function taking an int param", then the prefix is "returning void". So we have fn is "pointer to function taking int param, returning void"

If we start a signal, the first level has a suffix (function) and a prefix (returning pointer). Need the next level out to see what that points to (function returning void). So we end up with "function with two params (int and pointer to function), returning pointer to function with one (int) param, returning void"

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
6

The rule is correct. However, one should be very careful in applying it.

I suggest to apply it in a more formal way for C99+ declarations.

The most important thing here is to recognize the following recursive structure of all declarations (const, volatile, static, extern, inline, struct, union, typedef are removed from the picture for simplicity but can be added back easily):

base-type [derived-part1: *'s] [object] [derived-part2: []'s or ()]

Yep, that's it, four parts.

where

  base-type is one of the following (I'm using a bit compressed notation):
    void
    [signed/unsigned] char
    [signed/unsigned] short [int]
    signed/unsigned [int]
    [signed/unsigned] long [long] [int]
    float
    [long] double
    etc

  object is
      an identifier
    OR
      ([derived-part1: *'s] [object] [derived-part2: []'s or ()])

  * is *, denotes a reference/pointer and can be repeated
  [] in derived-part2 denotes bracketed array dimensions and can be repeated
  () in derived-part2 denotes parenthesized function parameters delimited with ,'s
  [] elsewhere denotes an optional part
  () elsewhere denotes parentheses

Once you've got all 4 parts parsed,

  [object] is [derived-part2 (containing/returning)] [derived-part2 (pointer to)] base-type 1.

If there's recursion, you find your object (if there's any) at the bottom of the recursion stack, it'll be the inner-most one and you'll get the full declaration by going back up and collecting and combining derived parts at each level of recursion.

While parsing you may move [object] to after [derived-part2] (if any). This will give you a linearized, easy to understand, declaration (see 1 above).

Thus, in

char* (**(*foo[3][5])(void))[7][9];

you get:

  1. base-type = char
  2. level 1: derived-part1 = *, object = (**(*foo[3][5])(void)), derived-part2 = [7][9]
  3. level 2: derived-part1 = **, object = (*foo[3][5]), derived-part2 = (void)
  4. level 3: derived-part1 = *, object = foo, derived-part2 = [3][5]

From there:

  1. level 3: * [3][5] foo
  2. level 2: ** (void) * [3][5] foo
  3. level 1: * [7][9] ** (void) * [3][5] foo
  4. finally, char * [7][9] ** (void) * [3][5] foo

Now, reading right to left:

foo is an array of 3 arrays of 5 pointers to a function (taking no params) returning a pointer to a pointer to an array of 7 arrays of 9 pointers to a char.

You could reverse the array dimensions in every derived-part2 in the process as well.

That's your spiral rule.

And it's easy to see the spiral. You dive into the ever more deeply nested [object] from the left and then resurface on the right only to note that on the upper level there's another pair of left and right and so on.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Interesting. First you say that the rule is correct, then you give an explication of how to read a declaration which doesn't use it, and in fact contradicts it. – James Kanze Apr 28 '13 at 17:05
  • You might mention that when you recurse (I'd use the term nest, but it comes down to the same thing), nested parts of the declaration don't contain your `base-part` (except in function parameters, where you have a complete declaration nested in another declaration). – James Kanze Apr 28 '13 at 17:07
  • @JamesKanze That which you want to mention should be immediately apparent. Observe that there's no `base-type` anywhere under `object is...`. – Alexey Frunze Apr 29 '13 at 01:46
  • Well, it's obvious to you and to me, but it might be worth mentionning to others, who are having problems parsing declarations. – James Kanze Apr 29 '13 at 07:55
  • @JamesKanze What makes you think the text allows for additional `base-type's` being somewhere in a declaration (other than in the function parameter list or in an array dimension expression with `sizeof(type)`)? – Alexey Frunze Apr 29 '13 at 08:34
  • Because you're recursing, and you've not given any other syntax for a declaration. – James Kanze Apr 29 '13 at 09:15
  • @JamesKanze What wasn't clear in the `where` part? Or did you ignore it? – Alexey Frunze Apr 29 '13 at 09:18
1

E.g.:

int * a[][5];

This is not an array of pointers to arrays of int.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656