1

I am trying to write a regex to detect IP addresses and floating point number in re2c (http://re2c.org/). Here is the regex I am using

<SYMBOL>        [-+]?[0-9]+[.][0-9]+ { RETURN(FLOAT); }
<SYMBOL>        [0-9]{1,3}'.'[0-9]{1,3}'.'[0-9]{1,3}'.'[0-9]{1,3} {RETURN (IPADDR); }

Whenever I compile, it throws error about some YYMARKER being undeclared. But if I use only one of the rules the compilation goes fine. I guess re2c is having trouble with backtracking based regex since both the rules have a large data set with common prefix (for example 192.132 could be starting of both a floating point number as well as ip address).

Here is the command line I am using to first generate the tokenizer file. re2c itself does not throw any error.

 re2c  -c -o tokenizer.c tokenizer.re

But when i compile the C file i get the following error.

tokenizer.c: In function 'getnext_querytoken':
tokenizer.c:74: error: 'YYMARKER' undeclared (first use in this function)
tokenizer.c:74: error: (Each undeclared identifier is reported only once
tokenizer.c:74: error: for each function it appears in.)

Is there any way I can solve this problem ?

sushil
  • 165
  • 1
  • 9

2 Answers2

2

@sushil, you are right: YYMARKER is a part of re2c API.

However, re2c is not "having trouble with backtracking based regex since both the rules have a large data set". re2c-generated lexers only iterate the input once (complexity is linear). YYMARKER is needed because of the overlapping rules, as explained in this example: http://re2c.org/examples/example_01.html :

YYMARKER (line 5) is needed because rules overlap: it backups input position of the longest successful match. Say, we have overlapping rules "a" and "abc" and input string "abd": by the time "a" matches there's still a chance to match "abc", but when lexer sees 'd' it must rollback. (You might wonder why YYMARKER is exposed at all: why not make it a local variable like yych? The reason is, all input pointers must be updated by YYFILL as explained in Arbitrary large input and YYFILL example.)

skvadrik
  • 586
  • 1
  • 3
  • 11
0

Looks like i did not read the manpage properly. According to the manpage I needed to manually define the variable YYMARKER to support backtracking in re2c. Here is the extract from http://re2c.org/manual.html

YYMARKER l-value of type * YYCTYPE. The generated code saves backtracking information in YYMARKER. Some easy scanners might not use this.

sushil
  • 165
  • 1
  • 9