JavaScript lexer: dealing with "/"

Question

Possible Duplicate:
Division/RegExp conflict while tokenizing Javascript

I'm writing a JS lexer for fun and there's just one piece that's missing: the part that can chew in regexes.

Take for instance the following valid JS piece of code: /ab+c/;

How can a JS lexer know whether it's dealing with a regex or with
[Operator('/'), Identifier('ab'), Operator('+'), Identifier('c'), Operator('/'), Semicolon] ?

I've never worked on a project like this, but I'd have thought that any `/` character not preceded by an identifier would be parsed as the beginning of a regex. — nrabinowitz, Jan 27 '13 at 22:51
I feel this is the task of the parser, not that of the lexer. — , Jan 27 '13 at 22:53
Please make your question title more descriptive than "JavaScript Lexer", which does nothing to describe (briefly) what your question is about. — Jared Farrish, Jan 27 '13 at 23:09

score 3 · Answer 1 · answered Jan 27 '13 at 22:59

3

How can a JS lexer know whether it's dealing with a regex or with [some expression with operator / inside]?

Well, the lexer can't. This is something the parser should do.

answered Jan 27 '13 at 22:59

score 0 · Accepted Answer · answered Jan 27 '13 at 22:58

You would need to implement a Lexical grammar which included parsing regex. According to ECMA Script documenation, "A RegExp grammar for ECMAScript is given in 15.10":

"The form and functionality of regular expressions is modeled 
after the regular expression facility in the Perl 5 programming language."

See also: ECMAScript Lexical Conventions

JavaScript lexer: dealing with "/"

2 Answers2