Surprisingly I don't see an explicit mention of this grammar rule that a numeric literal must start with a decimal digit. It's mentioned indirectly in the section you linked, a hex number prefixed with a $ sign must have a digit after the $ rather than a letter, but they fail to say "must still" to even imply that that's always required.
Earlier, in 3.1, they say that identifiers must start with letters, but don't say that only identifiers can start with letters. (Because that's not true, so can register names and instruction mnemonics. But not numeric literals.)
This might be one of those things that's so "obviously" and well-known to be true (to the developers / manual authors) that they forgot to write it down explicitly in the manual anywhere.
The hex examples do show it, though, including 0c8h
but no c8h
. They do show examples in other bases where the leading zero isn't required.
Some of the things that make it "obvious" and necessary that tokens starting with an alphabetic character should never be parsed as numeric literals:
AH through DH are register names, so must not get parsed as numbers. It would be very weird if EH
was a numeric literal but DH
wasn't. (It's normal that register names fit the same pattern as symbol names, not numbers. Unless you're on PowerPC, where GAS syntax just uses bare numbers for both registers and immediates; you have to remember which positions are which by instruction. Or use gcc -mregnames
. But that's an IBM architecture so of course it uses weird conventions, like numbering the bits backwards.)
It would be super weird for abcdefgh
to be a symbol name but abcdefh
to be a numeric literal (because without the g
, it's all valid hex digits and a trailing h.)
You couldn't use English words like each:
as label / symbol names, for the same reason you can't use 1234:
. (I tried; foo.asm:1: error: label or instruction expected at start of line
). That's a valid C identifier, so it would be inconvenient not to be able to use it. $eax
lets you use that as a symbol name, but $1234
in NASM is equivalent to 0x1234
, with $
doing double duty as a hex indicator, so it doesn't make something into a symbol name if the thing uses digits.
And perhaps most importantly, this is how earlier x86 assemblers for DOS worked, ones that NASM cherry-picked the good parts of their syntax from. Like MASM, but also A86 and as86 and stuff like that.
In the early days of NASM, people were switching to NASM from other assemblers and would already know this rule.
(How to represent hex value such as FFFFFFBB in x86 assembly programming? mentions a few other assemblers other than NASM.)
None of this justifies the omission from the manual, merely explains it. A wording tweak to mention this in 3.4.1 would be a good idea.