2

Possible Duplicate:
Division/RegExp conflict while tokenizing Javascript

I'm writing a JS lexer for fun and there's just one piece that's missing: the part that can chew in regexes.

Take for instance the following valid JS piece of code: /ab+c/;

How can a JS lexer know whether it's dealing with a regex or with
[Operator('/'), Identifier('ab'), Operator('+'), Identifier('c'), Operator('/'), Semicolon] ?

Community
  • 1
  • 1
adrianton3
  • 2,258
  • 3
  • 20
  • 33
  • I've never worked on a project like this, but I'd have thought that any `/` character not preceded by an identifier would be parsed as the beginning of a regex. – nrabinowitz Jan 27 '13 at 22:51
  • 2
    I feel this is the task of the parser, not that of the lexer. –  Jan 27 '13 at 22:53
  • Please make your question title more descriptive than "JavaScript Lexer", which does nothing to describe (briefly) what your question is about. – Jared Farrish Jan 27 '13 at 23:09

2 Answers2

3

How can a JS lexer know whether it's dealing with a regex or with [some expression with operator / inside]?

Well, the lexer can't. This is something the parser should do.

0

You would need to implement a Lexical grammar which included parsing regex. According to ECMA Script documenation, "A RegExp grammar for ECMAScript is given in 15.10":

"The form and functionality of regular expressions is modeled 
after the regular expression facility in the Perl 5 programming language."

See also: ECMAScript Lexical Conventions

Travis J
  • 81,153
  • 41
  • 202
  • 273