Whitespace separation in jflex grammar

Question

Suppose I need simple grammar that describes language like

foo 2
bar 21

but not

foo1

Using jflex I wrote smt like

WORD=[a-zA-Z]+
NUMBER=[0-9]+
WHITE_SPACE_CHAR=[\ \n\r\t\f]

%state AFTER_WORD
%state AFTER_WORD_SEPARATOR

%%
<YYINITIAL>{WORD}               { yybegin(AFTER_WORD); return TokenType.WORD; }        
<AFTER_WORD>{WHITE_SPACE_CHAR}+ { yybegin(AFTER_WORD_SEPARATOR); return TokenType.WHITE_SPACE; }        
<AFTER_WORD_SEPARATOR>{NUMBER}  { yybegin(YYINITIAL); return TokenType.NUMBER; }        

{WHITE_SPACE_CHAR}+             { return TokenType.WHITE_SPACE; }

But I dont like extra states that used for saying that there should be whitespace between word and digit. How I can simplify my grammar?

score 4 · Answer 1 · edited Feb 25 '15 at 06:38

4

You shouldn't need white space tokens when parsing at all.

Get rid of TokenType.WHITE_SPACE, and when you get white space in the lexer, just ignore it instead of returning anything.

To prevent 'foo1', add another rule for [A-Za-z0-9] and another token type for it that doesn't appear in the grammar; then it's a syntax error.

edited Feb 25 '15 at 06:38

AdrieanKhisbe

3,899
8
37
45

answered Jan 30 '13 at 00:36

user207421

305,947
44
307
483

Seems like true. But I'm actually need thoose whitespaces. Since I develop also plugin for IDE, and all elements are valueble. – Stan Kurilin Jan 30 '13 at 11:35
Which IDE are you targeting? – Bastien Jansen Jan 30 '13 at 12:43
2

Then you should not get rid of TokenType.WHITE_SPACE as @EJP suggested, because it is needed in your `ParserDefinition`. The JFlex snippet I suggested in my answer should work. Then in your parser you will write the logic that checks if an identifier is followed by a number. I suggest you take a look at this very nice tutorial, if you haven't done it yet: http://confluence.jetbrains.com/display/IntelliJIDEA/Custom+Language+Support There you will learn how to write a parser using Grammar-Kit, which is a very helpful tool :) – Bastien Jansen Jan 30 '13 at 12:56
@Nebelmann Ok. Thanks. Of course I saw that tutorial. But since I need also separated from Idea parser I've decided to combine lexer more powerfull. It was a mistake. – Stan Kurilin Jan 30 '13 at 13:28

score 1 · Accepted Answer · answered Jan 29 '13 at 17:43

From what I know of JFlex, if you are recognizing whitespaces corectly (which seems to be the case), you don't have to use extra states. Just make a rule for "identifiers", and another one for "numbers".

%%
{WORD}    { return TokenType.WORD; }
{NUMBER}  { return TokenType.NUMBER; }

If your language imposes each line to be consisted of exactly one identifier, one space and one number, this should be checked by syntactic analysis (i.e. by a parser), not lexical analysis.

The main difficult is that I can not clearly separate lexer from parser for my grammar, I believe. — Stan Kurilin, Jan 30 '13 at 11:37

Whitespace separation in jflex grammar

2 Answers2