1

HY.I'm trying to make a parser using JavaCC (an assembler) to transform from assembly code (Microcontroller 8051) to Machine COde.I have read about the javaCC grammar and the way it is structured but i have a dilemma.For example I have the ADD instruction:

`ADD A,Rn`   or   `ADD  A,@Ri` 

and for each of them i have a Machine code (hexa code)ex: ADD A,R0 translates to 28H . And also i can have the MOV instruction :
MOV A,Rn or MOV A,@Ri but i aloso have MOV data_addr,Rn and MOV R6,#data and so on .

Now my problem is how do i make this difference between 2 instructions.Supose i define my tokens like this:
Token{
<IN_MOV :"mov">
|<IN_ADD:"add"
}

i couldn't define functions for each token a function to specify a specific behavior because i have many instructions.To say that token.image==.equals("mov"), then go on one direction to the specific behaviour it is a little much , don't you think?....so i`m pretty much stuck.I don't know wich way to go .
Thx for the help.!

Alexander
  • 11
  • 1
  • 3

2 Answers2

3

It seems you expect too much from the lexer. The lexer is a finite state machine, while the parser is not.

So the lexer should produce tokens for the instructions (MOV, ADD, ...) and tokens for the operands. The lexer should not try to be too clever and expect specific operands for specific instructions.

Now the parser can expect specific combinations of instructions and operands. For example, you can accept only @ operands with the MOV instruction, so that any other operand will cause a parse exception.

If you need to further validate the combination of instructions and operands, you have to do it in the code of the productions. For example, you can treat two identical operands as an error for some instructions; this is very difficult to express in a production but trivial in code.

If you need to validate even further, for example by detecting invalid sequences of instructions, then you will have to maintain a state across the productions, or even build an AST and process it after the parsing is complete.

Laurent Pireyn
  • 6,735
  • 1
  • 29
  • 39
  • "For example, you can accept only @ operands with the MOV instruction, so that any other operand will cause a parse exception." how can i do that? I identify the token, which is it, and then i identify the argument , and after that i just state my cases? – Alexander Mar 15 '11 at 16:11
  • @Alexander You could define the `Mov()` production as follows: ` `. Alternatively, you can do that in the code of a more generic `Instruction()` production. – Laurent Pireyn Mar 15 '11 at 16:48
  • so basically i define a function of a parser per instruction, wright? `Mov()` will have ` `, and each of these functions i will call in a higher level one like so `Instr() {Mov() | And() | etc}` – Alexander Mar 15 '11 at 18:33
  • @Alexander Indeed. This way you can easily specify per-instruction grammar (the number of operands, the allowed types of operands, ...) and possibly special treatment in code (like forbidding some combinations of operand values). – Laurent Pireyn Mar 15 '11 at 22:03
0

See this complete assembly language grammar for lots of examples of the kinds of things you need to write in your parser for assembler code.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341