5

Is there a publicly available grammar or parser for ARM's Unified Assembler Language as described in ARM Architecture Reference Manual A4.2

This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax provides a canonical form for all ARM and Thumb instructions.

UAL describes the syntax for the mnemonic and the operands of each instruction.

Simply I'm interested in the code for parsing mnemonic and the operands of each instruction. For example how you could define a grammar for these lines?

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>
IT{<x>{<y>{<z>}}}{<q>} <firstcond>
LDC{L}<c> <coproc>, <CRd>, [<Rn>, #+/-<imm>]{!}
auselen
  • 27,577
  • 7
  • 73
  • 114
  • @dwelch I tried to improve the question. – auselen May 29 '13 at 14:49
  • Relevant: [Ubuntu on ARM assembler](https://wiki.ubuntu.com/ARM/Thumb2PortingHowto#Types_of_Assembly_Language)? – artless noise May 29 '13 at 17:00
  • Sorry I misunderstood the question. Perhaps gnu assembler or gnu c has something you can use. – old_timer May 29 '13 at 17:02
  • I think Chapter A4.2 of **ARM DDI 0406B** or the ARMv7A ARM, entitled *Unified Assembler Language* **is** the specification. It has sub-sections of *conditionals* and *labels*. The mnemonic (ASCII letters) are already equivalent between **thumb-2** and **ARM**; it is up to the assembler to pick a physical encoding. I am not sure I understand the question either? – artless noise May 29 '13 at 17:12
  • @dwelch no need to apologize, you helped me clear my mind. – auselen May 30 '13 at 05:39
  • @artlessnoise yes, that's definitely looks like the only specification. However as I tried to clarify the question in another run, how you could create a mechanism that's capable of parsing _lines_ for all possible instructions. Formally a grammar tells how you do it, in ARM UAL case I can't seem to find that. May be a context-free grammar might be unnecessary but yet again there should be a common syntax for all instructions - or not? – auselen May 30 '13 at 05:44
  • @artlessnoise btw thanks for the ubuntu link. – auselen May 30 '13 at 05:48

1 Answers1

4

If you need to create a simple parser based on an example-based grammar, nothing beats ANTLR:

http://www.antlr.org/

ANTLR translates a grammar specification into lexer and parser code. It's much more intuitive to use than Lexx and Yacc. The grammar below covers part of what you specified above, and it's fairly easy to extend to do what you want:

grammar armasm;

/* Rules */
program: (statement | NEWLINE) +;

statement: (ADC (reg ',')? reg ',' reg ',' reg
    | IT firstcond
    | LDC coproc ',' cpreg (',' reg ','  imm )? ('!')? ) NEWLINE;

reg: 'r' INT;
coproc: 'p' INT;
cpreg: 'cr' INT;
imm: '#' ('+' | '-')? INT;
firstcond: '?';

/* Tokens */
ADC: 'ADC' ('S')? ; 
IT:   'IT';
LDC:  'LDC' ('L')?;

INT: [0-9]+;
NEWLINE: '\r'? '\n';
WS: [ \t]+ -> skip;

From the ANTLR site (OSX instructions):

$ cd /usr/local/lib
$ wget http://antlr4.org/download/antlr-4.0-complete.jar
$ export CLASSPATH=".:/usr/local/lib/antlr-4.0-complete.jar:$CLASSPATH"
$ alias antlr4='java -jar /usr/local/lib/antlr-4.0-complete.jar'
$ alias grun='java org.antlr.v4.runtime.misc.TestRig'

Then on the grammar file run:

antlr4 armasm.g4
javac *.java
grun armasm program -tree

    ADCS r1, r2, r3
    IT ?
    LDC p3, cr2, r1, #3 
    <EOF>

This yields the parse tree broken down into tokens, rules, and data:

(program (statement ADCS (reg r 1) , (reg r 2) , (reg r 3) \n) (statement IT (firstcond ?) \n) (statement LDC (coproc p 3) (cpreg cr 2) (reg r 1) , (imm # - 3) ! \n))

The grammar doesn't yet include the instruction condition codes, nor the details for the IT instruction at all (I'm pressed for time). ANTLR generates a lexer and parser, and then the grun macro wraps them in a test rig so I can run text snippets through the generated code. The generated API is straightfoward to use in your own applications.

For completeness, I looked online for an existing grammar and didn't find one. Your best bet there might be to take apart gasm and extract its parser spec, but it won't be UAL syntax and it will be GPL if that matters to you. If you only need to handle a subset of the instructions then this is a good way to go.

Joe P
  • 141
  • 6
  • +1, so if I understand correctly, if I copy&paste all instructions to a file, antlr can create a grammar out of it? – auselen May 30 '13 at 20:23
  • 1
    Sorry i've been off so long. No - just that if you translate that list of instructions into an ANTLR grammar (which isn't so hard), you get an auto-generated lexer and parser. I'm willing to help, since it's useful to me as well. – Joe P Jun 12 '13 at 21:36
  • I was creating a list of instructions out of ARM ARM, I'll try to finish it soon, then will work on the grammar. https://gist.github.com/auselen/5681633 – auselen Jun 13 '13 at 06:41
  • I'm still working on a first cut at the grammar, but my time to work on it is spotty. I'll let you know when I get something working. – Joe P Jul 05 '13 at 23:01