1

I would like to create a parser that takes in any LaTeX formatted string and returns an expression that Python can evaluate.

I am having a couple of issues with fractions. Here are some example:

LaTeX (input) Interoperable String (output)
\frac{1}{2} ((1)/(2))
\frac{x}{3b} ((x)/(3b))
\frac{2-m}{3} ((2-m)/(3))
\frac{7}{5+y} ((7)/(5+y))

Here is what I have tried so far:

fraction_re = re.compile(r"\\frac{(.*?)}{(.*?)}")

def parser(expression):

    fractions = fraction_re.findall(expression)

    for numerator, denominator in fractions:
        pattern = r"\\frac\{%s\}\{%s\}" % (numerator, denominator)
        replace = f"(({numerator})/({denominator}))"
        expression = re.sub(pattern=pattern, repl=replace, string=expression)

    return expression

This works fine for cases one and two (see table) but is having problems with cases three and four. I suspect that the - and the + symbols are causing issues as they themselves are regex metacharacters.

I thought of adding some extra lines to escape them, e.g.

numerator = re.sub(pattern='+', repl='\+', string=numerator)

But this doesn't strike me as a good long term strategy. I have also tried adding square brackets to the pattern variable (as normal regex symbols in square brackets are not interpreted as such), i.e.

pattern = r"\\frac\{[%s]\}\{[%s]\}" % (numerator, denominator)

But this didn't work either.

Can anyone help me?

Thanks in advance.

p.s.

I know that this has been asked many times on SO before (e.g. Python Regex to Simplify LaTex Fractions Using Python Regex to Simplify Latex Fractions Using if-then-else conditionals with Python regex replacement) but I feel like their questions are a little different to mine and I have not been able to find an answer that helps me much.

Also I know that there already exist out-of-the-box parsers that do exactly what I'd want (for example: https://github.com/augustt198/latex2sympy) but I really would like to build this myself.

jda5
  • 1,390
  • 5
  • 17
  • @Reti43 finding the fractions is not the issue. The issue the taking the matches and converting them into the format: `((numerator)/(denominator))` – jda5 Mar 23 '21 at 21:30
  • I think your regex should be something along the lines of `{[^}]+}` – Vishal Singh Mar 23 '21 at 21:53

2 Answers2

1

I'm not sure why you're taking a two-stage approach; as you have noted it is causing you problems with regex meta characters in the second stage. You could just make the substitution as you match using re.sub:

import re

fraction_re = re.compile(r'\\frac{([^}]+)}{([^}]+)}')

def parser(expression):
    return fraction_re.sub(r'((\1)/(\2))', expression)

print(parser(r'\frac{1}{2}  \frac{x}{3b}   \frac{2-m}{3}   \frac{7}{5+y}'))

Output

((1)/(2))  ((x)/(3b))   ((2-m)/(3))   ((7)/(5+y))

Note that it's more efficient to use [^}]+ than .*? in your regex as it will reduce backtracking.

Nick
  • 138,499
  • 22
  • 57
  • 95
  • Wow this is so succinct! I knew that what I was doing was a little inefficent as my code was going over the string twice. So just to be clear does `\1` and `\2` mean group one and two? – jda5 Mar 24 '21 at 09:51
  • 1
    @JacobStrauss exactly. You can use named groups as in Jan's answer, but I think this is easier. – Nick Mar 24 '21 at 10:54
  • Thanks Nick! I appreciate you for the time you've taken to help me out :) – jda5 Mar 24 '21 at 10:56
1

You could use a simple lambda function within re.sub() as in:

import re

data = r"""
some very cool \textbf{Latex} stuff

\begin{enumerate}
\item even a very cool item
\end{enumerate}

Here comes the fun
\frac{1}{2} 
\frac{x}{3b}
\frac{2-m}{3}
\frac{7}{5+y}
"""

rx = re.compile(r'\\frac\{(?P<numerator>[^{}]+)\}\{(?P<denominator>[^{}]+)\}')

data = rx.sub(lambda m: f"(({m.group('numerator')}/({m.group('denominator')})", data)
print(data)

Which will yield

some very cool \textbf{Latex} stuff

\begin{enumerate}
\item even a very cool item
\end{enumerate}

Here comes the fun
((1/(2)
((x/(3b)
((2-m/(3)
((7/(5+y)

The expression boils down to

\\frac\{(?P<numerator>[^{}]+)\}\{(?P<denominator>[^{}]+)\}

No need to use named groups, really, just to make it crystal clear.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Jan this is a superb answer, thanks for helping me out! I didn't think to use lambda functions, and named groups are new to me. I learnt loads, thanks – jda5 Mar 24 '21 at 11:09