How should assemblers distinguish between symbol and all-alpha hex value?

Question

I'm learning some 8080 assembly, which uses the older suffix H to indicate hexadecimal constants (vs modern prefix 0x or $). I'm also noodling around with a toy assembler and thinking about how to tokenize source code.

It's possible to write a valid hex constant (say) BEEFH, which contains only alphabetical characters. It's also possible to define a label called BEEFH. So when I write:

ORG 0800H

START:  ... 
        JMP BEEFH   ; <--- how is this resolved?
        .... 

BEEFH:  ... 
        ...

This should be syntactically valid based on the old Intel docs: BEEFH meets the naming rules for labels, and of course is also a valid 16-bit address. The ambiguity of whether the operand to JMP here is an address constant or an identifier seems like a problem.

I don't have access to the original 8080 assembler to see what it does with this example. Here's an online 8080 assembler that appears to parse the operand to JMP as a label reference in all cases, but obviously a proper assembler should be able to target an absolute address with a JMP instruction.

Can anyone shed light on what the conventions around this actually are/should be? Am I missing something obvious? Thanks.

Many different types of grammars and parsers can parse assembly language. — TomServo, Jul 08 '21 at 00:37
assembly language is specific to the tool not the target (assume an infinite number of incompatible 8080 assembly languages one for every author of a tool for that target). typically they will preceed with a zero in this case. 0BEEFh. but some assembly languages deal with labels vs numbers in other ways $BEEF #0xBEEF as well as just supporting 0xBEEF taking from the C syntax. — old_timer, Jul 08 '21 at 12:09

score 2 · Accepted Answer · answered Jul 07 '21 at 21:38

2

Someone left a comment that they then deleted, but I looked again and it was right on. Apparently I missed the note in the old Intel manual that says this about hex constants:

Hex constants must begin with a decimal digit. So that's certainly how you avoid the semantic ambiguity when parsing. It seems a bit inelegant to me as a solution but I guess then you should just use a modern prefix.

Thanks, anonymous commenter!

answered Jul 07 '21 at 21:38

Ben Zotto

70,108
23
141
204

2

Well, that was me, but I wasn't sure I remembered it correctly, so I removed it again - you are welcome :) – 500 - Internal Server Error Jul 07 '21 at 22:56
2

Yeah, the need for a leading 0 for constants that would otherwise start with A..F are why I dislike the trailing-h way, and always prefer `0xFFFF` in x86 assemblers like NASM where you have the option of using either. Unfortunately some x86 assemblers (mostly DOS-originated ones like MASM) *only* allow the clunky trailing-h suffix. [How to represent hex value such as FFFFFFBB in x86 assembly programming?](https://stackoverflow.com/a/37152498) – Peter Cordes Jul 08 '21 at 19:30

How should assemblers distinguish between symbol and all-alpha hex value?

1 Answers1