How do I properly parse Regex in ANTLR

Question

I want to parse this

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

and other variations of course of regular expressions. Does someone know how to do this properly?

Thanks in advance.

Edit: I tried throwing in all regex signs and chars in one lexer rule like this

REGEX: ( DIV | ('i') | ('@') | ('[') | (']') | ('+') | ('.') | ('*') | ('-') | ('\\') | ('(') | (')') |('A') |('w') |('a') |('z') |('Z')
     //|('w')|('a'));

and then make a parser rule like this:

regex_assignment: (REGEX)+

but there are recognition errors(extraneous input). This is definetly because these signs are ofc used in other rules before.

The thing is I actually don't need to process these regex assignments, I just want it to be recognized correctly without errors. Does anyone have an approach for this in ANTLR? For me a solution would suffice, that just recognzies this as regex and skips it for example.

Welcome to SO. Please read [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) for informations how to write questions. Important is to show what you have done yourself already. — Mike Lischke, May 06 '17 at 10:56
@MikeLischke I tried to give some more info but I don't know what else to write here. That's why I'm asking the question in the first place. — pokeahontas, May 08 '17 at 13:47
You need to first define a *grammar* for regex. If you don't know what this is, you better find out or you simply cannot use ANTLR. You will find that a grammar for regex is pretty simple, but it isn't what you show. Once you have that grammar, then you need to learn to run ANTLR on it. The ANTLR documentation is pretty complete. — Ira Baxter, May 08 '17 at 19:53
Thx for your input. I have actually written a big grammar for erb and Ruby files for rails already with split lexer and parser. I was just looking for a quick solution to this because I really just want my grammar to recognize this as regex and move on. Seems a little over kill to integrate a whole grammar for this. I just want my parser to recognize that / /i is regex. — pokeahontas, May 08 '17 at 20:02

score 1 · Answer 1 · edited May 23 '17 at 12:10

1

Unfortunately, there is no regex grammar yet in the ANTLR grammar repository, but similar questions have come up before, e.g. Regex Grammar. Once you have the (E)BNF you can convert that to ANTLR. Or alternatively, you can use the BNF grammar to check your own grammar rules to see if they are correctly defined. Simply throwing together all possible input chars in a single rule won't work.

edited May 23 '17 at 12:10

Community

1
1

answered May 09 '17 at 07:14

Mike Lischke

48,925
16
119
181

score 0 · Answer 2 · answered May 02 '23 at 08:52

0

There is regex grammar now (since 2019): https://github.com/antlr/grammars-v4/tree/master/xsd-regex

answered May 02 '23 at 08:52

Sirmabus

636
8
8

How do I properly parse Regex in ANTLR

2 Answers2