Perl Integer Constants
Integer constants in Perl can be
- in base 16 if they start with
^0x
- in base 2 if they start with
^0b
- in base 8 if they start with
0
- otherwise they are in base 10.
Following that leader is any number of valid digits in that base and also optional underscores.
Note that digit does not mean \p{POSIX_Digit}
; it means \p{Decimal_Number}
, which is really quite different, you know.
Please note that any leading minus sign is not part of the integer constant, which is easily proven by:
$ perl -MO=Concise,-exec -le '$x = -3**$y'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const(IV 3) s
4 <$> gvsv(*y) s
5 <2> pow[t1] sK/2
6 <1> negate[t2] sK/1
7 <$> gvsv(*x) s
8 <2> sassign vKS/2
9 <@> leave[1 ref] vKP/REFC
-e syntax OK
See the 3 const
, and much later on the negate
op-code? That tells you a bunch, including a curiosity of precedence.
Perl Identifiers
Identifiers specified via symbolic dereferencing have absolutely no restriction whatsoever on their names.
- For example,
100->(200)
calls the function named 100
with the arugments (100, 200)
.
- For another,
${"What’s up, doc?"}
refers to the scalar package variable by that name in the current package.
- On the other hand,
${"What's up, doc?"}
refers to the scalar package variable whose name is ${"s up, doc?"}
and which is not in the current package, but rather in the What
package. Well, unless the current package is the What
package, of course. Similary $Who's
is the $s
variable in the Who
package.
One can also have identifiers of the form ${^
identifier}
; these are not considered symbolic dereferences into the symbol table.
Identifiers with a single character alone can be a punctuation character, include $$
or %!
.
Identifers can also be of the form $^C
, which is either a control character or a circumflex folllowed by a non-control character.
If none of those things is true, a (non–fully qualified) identifier follows the Unicode rules related to characters with the properties ID_Start
followed by those with the property ID_Continue
. However, it overrules this in allowing all-digit identifiers and identifiers that start with (and perhaps have nothing else beyond) an underscore. You can generally pretend (but it’s really only pretending) that that is like saying \w+
, where \w
is as described in Annex C of UTS#18. That is, anything that has any of these:
- the Alphabetic property — which includes far more than just Letters; it also contains various combining characters and the Letter_Number code points, plus the circled letters
- the Decimal_Number property, which is rather more than merely
[0-9]
- Any and all characters with the Mark property, not just those marks that are deemed Other_Alphabetic
- Any characters with the Connector_Puncutation property, of which underscore is just one such.
So either ^\d+$
or else
^[\p{Alphabetic}\p{Decimal_Number}\p{Mark}\p{Connector_Punctuation}]+$
ought to do it for the really simple ones if you don’t care to explore the intricacies of the Unicode ID_Start and ID_Continue properties. That’s how it’s really done, but I bet your instructor doesn’t know that. Perhaps one shan’t tell him, eh?
But you should cover the nonsimple ones I describe earlier.
And we haven’t talked about packages yet.
Perl Packages in Identifiers
Beyond those simple rules, you must also consider that identifiers may be qualified with a package name, and package names themselves follow the rules of identifiers.
The package separator is either ::
or '
at your whim.
You do not have to specify a package if it is the first component in a fully qualified identifier, in which case it means the package main
. That means things like $::foo
and $'foo
are equivalent to $main::foo
, and isn't_it()
is equivalent to isn::t_it()
. (Typo removed)
Finally, as a special case, a trailing double-colon (but not a single-quote) at the end of a hash is permitted, and this then refers to the symbol table of that name.
Thus %main::
is the main
symbol table, and because you can omit main, so too is %::
.
Meanwhile %foo::
is the foo
symbol table, as is %main::foo::
and also %::foo::
just for perversity’s sake.
Summary
It’s nice to see instructors giving people non-trivial assignments. The question is whether the instructor realized it was non-trivial. Probably not.
And it’s hardly just Perl, either. Regarding the Java identifiers, did you figure out yet that the textbooks lie? Here’s the demo:
$ perl -le 'print qq(public class escape { public static void main(String argv[]) { String var_\033 = "i am escape: ^\033"; System.out.println(var_\033); }})' > escape.java
$ javac escape.java
$ java escape | cat -v
i am escape: ^[
Yes, it’s true. It is also true for many other code points, especially if you use -encoding UTF-8
on the compile line. Your job is to find the pattern that describes these startlingly unforbidden Java identifiers. Hint: make sure to include code point U+0000.
There, aren’t you glad you asked? Hope this helps. Or something. ☺