11

I am trying to configure the TeXWorks editor to use the same syntax coloring as TeXMaker. However, TexWorks uses regexes to specify what should be coloured. Unfortunately it doesn't have a default setting for math.

I want to match everything between $ and $, everything between \[ and \], everything between \( and \), everything between $$ and $$. The latter is not very necessary because it's rarely used in LaTeX documents.

It can also be more than one regexes to match all cases.

Of course \$ is escaped so I don't want to match that, nor \\[ etc.

Then I also want to match everything between \begin{equation} and \end{equation}, but that is simple.

'It cannot be done' is a possible answer.

marczellm
  • 1,224
  • 2
  • 18
  • 42
  • Why did answer and comments disappear when editing question? – marczellm Jan 06 '13 at 14:20
  • 1
    Oli deleted his answer in response to your criticism so it’s not visible any more. A word on your comment there, though: `\(…\)` *can* be nested (consider `\(x = y + z \text{ where \(z\) is the error}\)` which is entirely valid). That’s one of the reasons to prefer it in favour of `$…$`. However, you might want to ignore that for simplicity’s sake. – Konrad Rudolph Jan 06 '13 at 14:22
  • @KonradRudolph Yep, that's not important. – marczellm Jan 06 '13 at 14:25

1 Answers1

13

Try this PCRE regex:

(?<!\\)    # negative look-behind to make sure start is not escaped 
(?:        # start non-capture group for all possible match starts
  # group 1, match dollar signs only 
  # single or double dollar sign enforced by look-arounds
  ((?<!\$)\${1,2}(?!\$))|
  # group 2, match escaped parenthesis
  (\\\()|
  # group 3, match escaped bracket
  (\\\[)|                 
  # group 4, match begin equation
  (\\begin\{equation\})
)
# if group 1 was start
(?(1)
  # non greedy match everything in between
  # group 1 matches do not support recursion
  (.*?)(?<!\\)
  # match ending double or single dollar signs
  (?<!\$)\1(?!\$)|  
# else
(?:
  # greedily and recursively match everything in between
  # groups 2, 3 and 4 support recursion
  (.*(?R)?.*)(?<!\\)
  (?:
    # if group 2 was start, escaped parenthesis is end
    (?(2)\\\)|  
    # if group 3 was start, escaped bracket is end
    (?(3)\\\]|     
    # else group 4 was start, match end equation
    \\end\{equation\}
  )
))))

See this regex in action: https://regex101.com/r/wP2aV6/25

Since this regex uses recursion it will handle nested mathematical expressions correctly.

This works only on PCRE compatible regex engines. It requires some advanced features of regex engines, like negative lookbehind, conditional expressions and recursion which are not present in all regex engines.

Unless you need something really simple then I would advise against using this regex and instead using a proper LaTeX parser.

Lodewijk Bogaards
  • 19,777
  • 3
  • 28
  • 52
  • Unfortunately it seems the regex engine in TeXWorks (which is probably Qt's QRegEx by the way) doesn't support some features you used. Quote: ["lookbehind assertions, independent subexpressions and conditional expressions are not supported"](http://doc.qt.digia.com/qt/qregexp.html) I understand these features are necessary, so it seems I'm out of luck here. This does not concern your regex which is correct anyway, so thank you for your work. TeXWorks should switch to another regex engine to support math highlighting. – marczellm Jan 26 '13 at 16:00
  • 1
    Would it be possible to detect the following: The opening $ must have a character immediately to its right, while the closing $ must have a character immediately to its left. Thus, $20,000 and $30,000 won’t parse as math. So we can have $20 dollars and $\sum_{i=1}^{\infty}$ – jmlopez Mar 13 '13 at 22:13
  • do you mean a space instead of a character? – Lodewijk Bogaards Mar 28 '14 at 11:07
  • This answer helped me immensely. Just a note, though, that I found a slight bug (that's visible even on the Regex101 link): multiple inline equations get slurped too greedily. For example, the phrase `The variable $x$ can be written as $y$` is matched as `$x$ can be written as $y$`. I prefer it to match twice--`$x$` and `$y$`--so I've changed one line to `(.*?(?R)?.*?)`. Final note, if you're doing this regexp in Ruby 1.9 (or some other non-PCRE derivative), you can rewrite this line as `(.*?(\g<1>)?.*?)`. –  Apr 21 '14 at 22:33
  • I am getting `unexpected end of pattern` using above in python, – Nitesh Verma Aug 23 '17 at 17:04
  • @NiteshVerma Indeed this regex requires recursive matches which is not supported by Python's regex engine. You can take the recursion out and get something that works without support for nested expressions. Just replace the `(.*?(?R)?.*?)` line with `(.*?)`. – Lodewijk Bogaards Aug 23 '17 at 22:43