3
perl -E 'say for map s/(æ|ø|å)/   {qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'
perl -E 'say for map s/(æ|ø|å)/"".{qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'

The first line above gives me syntax error at -e line 1, near "}->" but the second prints roed, gul and blaa as expected. Is this a weakness of the compiler or are there some reason for it that I can't see? I tested and got this behaviour in versions 5.10, 5.22 and 5.26.

zdim
  • 64,580
  • 5
  • 52
  • 81
Kjetil S.
  • 3,468
  • 20
  • 22

1 Answers1

4

The {...} are interpreted as a BLOCK, not a hashref. We can see this by adding a +

perl -E'say for map s/(æ|ø|å)/+{qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'

and now it works, since what follows the unary + must be an expression; so + disambiguates the code. Then the interpreter goes on to identify the construct as an anonymous hash constructor.

Otherwise it has to guess at { since it can't parse away before deciding whether it is parsing a block or an expression. It could analyze the context to determine what {...} is but I'd find it reasonable if that was simply deemed much too complex as a trade off.

In the other example it is the concatenation operator (.) that does it.


For another example of the unary + forcing treatment of the following code as an expression, and for details about related documentation, see this post.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • Thx for the `+` tip. But couldn't or shouldn't the `->` prevented it from being interpreted as a block? – Kjetil S. Sep 04 '19 at 08:35
  • @Kjetil S., No, because it can only reach the `->` after deciding whether to parse what starts with `{` as a block or a hash constructor. Remember, a block contains statements, whereas the hash constructor contains a (possibly null) expression, so they use different parsing rules. Since they're different, the parser can't simply parse the contents of the curlies then make a decision by looking at what follows the `}`. It does look ahead at token after the `{` to guess at what the `{` means, but to look further ahead is simply not productive. – ikegami Sep 04 '19 at 13:55
  • @ikegami I don't know the Perl parser/compiler, but know enough about parsing to think that my simple example code is well within the reach of a traditional backtracking recursive parser. I suspect the Perl compiler have many rules which uses such backtracking already (and not only look-ahead to try to resolve ambiguous tokens straight away) where it's not immediately clear what it's looking at and have to rewind to try something else. I'm guessing that this case was deemed too rare, is on a todo list or was overlooked. – Kjetil S. Sep 04 '19 at 19:00
  • @Kjetil S., First, Perl uses an LR parser, not an LL parser, so backtracking isn't a thing. That's a good thing. When writing an LL parser, you try to eliminate all backtracking. It's slow, and it leads to poor error handling. Aside from the poor error handling, it would lead to other major problems: Perl's syntax is often dependent on whether statement, expression or operator is expected to remove ambiguities, so your solution to remove ambiguities would actually add some. – ikegami Sep 04 '19 at 19:32
  • @Kjetil S., Also, you're wrong that `->{$1}` would disambiguate because `-` is perfectly valid after a block. So you'd only discover the error after another level of recursions, which means you'd mishandle that much more bad code. Attempting what you suggest would not be productive. It just changes the kinds of guesses the parser has to make (what is this vs where did the error actually happen?), which simply shifts when you get weird errors. At least the way it is now is predictable, and avoidable once a programmer is made aware. It's a lot hard to ask programmers not to make syntax errors. – ikegami Sep 04 '19 at 19:38
  • I understand that the second part of the `s///e` is attempted to be interpreted as a BLOCK when it starts with `{`. My question is why. `$a={1..8}->{5}` is ok (6), then again `print({1..8}->{5})` gives the same error while `print "",{1..8}->{5}` doesn't. Might be about scalar context? But `@a=({1..8}->{5})` works. Wouldn't it be more natural to add a `do` as in `s//do{...}/e` if we wanted the second part to be read as a block and a missing `do` should default to the `{` being interpreted as the start of a hashref? – Kjetil S. Oct 16 '19 at 23:07
  • 1
    @KjetilS. "_question is why_" --- again, the best I can think of is that "_it has to guess at `{` ..._" (from the answer). As for these other examples, I don't know enough about how parsing works to discern in detail how it guesses. In most of them the `{}` seems to be taken as a block, with "_uninitialized_" warnings since `1..8` doesn't return a reference. Perhaps it is the context that guides the guess? Or some heuristics? On the practical side, all these examples clearly push their luck and should be written more properly so that nobody has to guess. – zdim Oct 17 '19 at 16:55
  • 1
    @KjetilS. Btw, the `print({1..8}->{5})` for me simply fails as syntax error. All these examples could be just discarded by the compiler, as "ambiguous." But that would then probably cripple some useful legitimate (I mean clear) uses so it tolerates it ... and guesses. – zdim Oct 17 '19 at 17:00