5

I am trying to assemble a tiny piece of code with ASM. This code sets CX to zeros and AX to ones. My code:

mov cx, 0000h
mov ax, ffffh

But I get this error:

$ nasm foo.asm
foo.asm:2: error: symbol `ffffh' not defined

I can resolve this error by writing mov ax, 0ffffh instead. But why does it not understand the ffffh syntax? Where in the NASM documentation does it specify what hexadecimal syntax is allowed and what is not?

I read https://nasm.us/doc/nasmdoc3.html#section-3.4.1 but cannot find anything there that disallows the ffffh syntax. What am I missing?

I read some of the other similar questions too provided as comments to this question. But none of them seems to point to some authoritative documentation or specification that confirms that a number must begin with a digit. If someone can point to the exact excerpt in NASM documentation or some spec that confirms this, that would answer this question.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Lone Learner
  • 18,088
  • 20
  • 102
  • 200
  • 2
    That `f` is a letter and letters start symbols. Numbers have to start with a digit so you need a leading `0`. – Jester Nov 25 '22 at 13:31
  • Does this answer your question (the linked question is about MASM not NASM but they support similar syntax for hex constants)? [How do I write letter-initiated hexadecimal numbers in masm code?](https://stackoverflow.com/questions/33276232/how-do-i-write-letter-initiated-hexadecimal-numbers-in-masm-code) – msaw328 Nov 25 '22 at 14:00
  • @msaw328: [How to represent hex value such as FFFFFFBB in x86 assembly programming?](https://stackoverflow.com/q/11733731) has an answer that covers NASM specifically, and mentions the same reason. – Peter Cordes Nov 25 '22 at 14:21
  • @PeterCordes One of the [answers](https://stackoverflow.com/a/37152498/5457426) to that question links to the question i referenced. I wanted to point to the original source. – msaw328 Nov 25 '22 at 14:22
  • @msaw328: It's not like that MASM answer actually has more to say about it than my answer a year later. I mostly added a link to it into my existing answer for help finding it in cases where it was a more appropriate duplicate (MASM questions). I checked the edit history, and my NASM/GAS/MASM answer said what it says about leading digits being required before I found and added that link to an earlier Q&A. So it's not an "original source" for my answer. Neither of them are original sources (like the NASM manual), just re-explaining a common fact. – Peter Cordes Nov 25 '22 at 14:27
  • Type `0FFFFh` and it will work – puppydrum64 Dec 02 '22 at 18:38
  • @puppydrum64 That's not the question. – Lone Learner Dec 13 '22 at 09:49

1 Answers1

4

Surprisingly I don't see an explicit mention of this grammar rule that a numeric literal must start with a decimal digit. It's mentioned indirectly in the section you linked, a hex number prefixed with a $ sign must have a digit after the $ rather than a letter, but they fail to say "must still" to even imply that that's always required.

Earlier, in 3.1, they say that identifiers must start with letters, but don't say that only identifiers can start with letters. (Because that's not true, so can register names and instruction mnemonics. But not numeric literals.)


This might be one of those things that's so "obviously" and well-known to be true (to the developers / manual authors) that they forgot to write it down explicitly in the manual anywhere.

The hex examples do show it, though, including 0c8h but no c8h. They do show examples in other bases where the leading zero isn't required.


Some of the things that make it "obvious" and necessary that tokens starting with an alphabetic character should never be parsed as numeric literals:

  • AH through DH are register names, so must not get parsed as numbers. It would be very weird if EH was a numeric literal but DH wasn't. (It's normal that register names fit the same pattern as symbol names, not numbers. Unless you're on PowerPC, where GAS syntax just uses bare numbers for both registers and immediates; you have to remember which positions are which by instruction. Or use gcc -mregnames. But that's an IBM architecture so of course it uses weird conventions, like numbering the bits backwards.)

  • It would be super weird for abcdefgh to be a symbol name but abcdefh to be a numeric literal (because without the g, it's all valid hex digits and a trailing h.)

  • You couldn't use English words like each: as label / symbol names, for the same reason you can't use 1234:. (I tried; foo.asm:1: error: label or instruction expected at start of line). That's a valid C identifier, so it would be inconvenient not to be able to use it. $eax lets you use that as a symbol name, but $1234 in NASM is equivalent to 0x1234, with $ doing double duty as a hex indicator, so it doesn't make something into a symbol name if the thing uses digits.

  • And perhaps most importantly, this is how earlier x86 assemblers for DOS worked, ones that NASM cherry-picked the good parts of their syntax from. Like MASM, but also A86 and as86 and stuff like that.
    In the early days of NASM, people were switching to NASM from other assemblers and would already know this rule. (How to represent hex value such as FFFFFFBB in x86 assembly programming? mentions a few other assemblers other than NASM.)


None of this justifies the omission from the manual, merely explains it. A wording tweak to mention this in 3.4.1 would be a good idea.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847