Regular expression for LaTeX with escaped } (curly brace) needed

Question

I just started to write a C program converting some LaTeX into HTML code. The best way in my opinion is to use regular expressions, yet I cannot make this simple idea work with PCRE: Replace something like \term{abc} by [pre]abc[/pre] (\term is a Latex command of my own). Right now here's the catch:

How do I handle escaped curly braces (\}) in \term?
How do I handle pairs like {}?
How do I make the regular expression so greedy that it consumes the first of many \term commands, but not all of them?

Well, many questions to figure it out. Hope somebody can help?

PS: I'm sorry if, in any case, I have overlooked an answer to a similar question...

These are really three separate questions. You will likely get better responses if you break this up. — Tim, Jan 18 '12 at 20:31

score 2 · Accepted Answer · edited May 23 '17 at 11:48

2

See perlfaq6(1) for "Can I use Perl regular expressions to match balanced text?". That said, since latex's complexity seems similar (if not worse) than (x)html, you might want to heed the words of RegEx match open tags except XHTML self-contained tags .

edited May 23 '17 at 11:48

Community

1
1

answered Jan 18 '12 at 21:01

jørgensen

10,149
2
20
27

I knew that answer would get a mention. I could smell Cthulu. – Tim Jan 18 '12 at 21:14
Sigh, I was kinda hoping to avoid writing a "real" LaTeX parser and be able to work with PCRe instead. Seems my gut feeling was right in the first place... – smiter Jan 19 '12 at 09:22

score 0 · Answer 2 · answered Jan 20 '12 at 17:31

0

I don't know exactly what you need, but you might consider htlatex (part of TeX4HT), pandoc or any of several other options. TeX is notoriously hard to parse.

answered Jan 20 '12 at 17:31

Ivan Andrus

5,221
24
31

Regular expression for LaTeX with escaped } (curly brace) needed

2 Answers2