I'm starting to write a code syntax highlighter in JavaScript, and I want to highlight text that is in quotes (both "s and 's) in a certain color. I need it be able to not be messed up by one of one type of quote being in the middle of a pair of the other quotes as well, but i'm really not sure where to even start. I'm not sure how I should go about finding the quotes and then finding the correct end quote.
-
Would a regex like this one help? http://stackoverflow.com/questions/249791/regexp-for-quoted-string-with-escaping-quotes – Oren Trutner Jul 29 '09 at 07:56
4 Answers
Unless you're doing this for the challenge, have a look at Google Code Prettify.
For your problem, you could read up on parsing (and lexers) at Wikipedia. It's a huge topic and you'll find that you'll come upon bigger problems than parsing strings.
To start, you could use regular expressions (although they rarely have the accuracy of a true lexer.) A typical regular expression for matching a string is:
/"(?:[^"\\]+|\\.)*"/
And then the same for '
instead of "
.
Otherwise, for a character-by-character parser, you would set some kind of state that you're in a string once you hit "
, then when you hit "
that is not preceded by an uneven amount of backslashes (an even amount of backslashes would escape eachother), you exit the string.

- 49,547
- 13
- 120
- 153
You can find quotes using regular expressions but if you're writing a syntax highlighter then the only reliable way is to step through the code, character by character, and decide what to do from there.
E.g. of a Regex
/("|')((?:\\\1|.)+?)\1/g
(matches "this" and 'this' and "thi\"s")

- 109,676
- 31
- 162
- 175
I did it with a single regular expression in php using backwards references. JS does not support it and i think that's what you need if you really want to detect undefined backslashes.