0

I have the following code that parses arithmetic expressions using parsimonious. It works OK but whitespaces are included in the parse tree. How can we get rid of whitespaces in the parse tree and only keep the meaningful tokens? Lark parsing library achieves that via %ignore WS. Is there something similar in parsimonious or another way to achieve the same effect?

from parsimonious.grammar import Grammar

g = '''
    sum = (number plus sum) / (number plus prod)
    prod = (number times prod) / (left_par number plus prod right_par) / number
    number = (ws ~"[\d]+" ws) / (left_par sum right_par)
    plus = ws "+" ws
    times = ws "*" ws
    left_par = ws "(" ws
    right_par = ws ")" ws
    ws = ~"[\s]*"
    '''

grammar = Grammar(g)
print(grammar.parse(' (134 +77 + 56) + 10 * 30' ))

This is the output:

<Node called "bold_text" matching "((bold stuff))">
    <Node called "bold_open" matching "((">
    <RegexNode called "text" matching "bold stuff">
    <Node called "bold_close" matching "))">
<Node called "sum" matching " (134 +77 + 56) + 10 * 30">
    <Node matching " (134 +77 + 56) + 10 * 30">
        <Node called "number" matching " (134 +77 + 56) ">
            <Node matching " (134 +77 + 56) ">
                <Node called "left_par" matching " (">
                    <RegexNode called "ws" matching " ">
                    <Node matching "(">
                    <RegexNode called "ws" matching "">
                <Node called "sum" matching "134 +77 + 56">
                    <Node matching "134 +77 + 56">
                        <Node called "number" matching "134 ">
                            <Node matching "134 ">
                                <RegexNode called "ws" matching "">
                                <RegexNode matching "134">
                                <RegexNode called "ws" matching " ">
                        <Node called "plus" matching "+">
                            <RegexNode called "ws" matching "">
                            <Node matching "+">
                            <RegexNode called "ws" matching "">
                        <Node called "sum" matching "77 + 56">
                            <Node matching "77 + 56">
                                <Node called "number" matching "77 ">
                                    <Node matching "77 ">
                                        <RegexNode called "ws" matching "">
                                        <RegexNode matching "77">
                                        <RegexNode called "ws" matching " ">
                                <Node called "plus" matching "+ ">
                                    <RegexNode called "ws" matching "">
                                    <Node matching "+">
                                    <RegexNode called "ws" matching " ">
                                <Node called "prod" matching "56">
                                    <Node called "number" matching "56">
                                        <Node matching "56">
                                            <RegexNode called "ws" matching "">
                                            <RegexNode matching "56">
                                            <RegexNode called "ws" matching "">
                <Node called "right_par" matching ") ">
                    <RegexNode called "ws" matching "">
                    <Node matching ")">
                    <RegexNode called "ws" matching " ">
        <Node called "plus" matching "+ ">
            <RegexNode called "ws" matching "">
            <Node matching "+">
            <RegexNode called "ws" matching " ">
        <Node called "prod" matching "10 * 30">
            <Node matching "10 * 30">
                <Node called "number" matching "10 ">
                    <Node matching "10 ">
                        <RegexNode called "ws" matching "">
                        <RegexNode matching "10">
                        <RegexNode called "ws" matching " ">
                <Node called "times" matching "* ">
                    <RegexNode called "ws" matching "">
                    <Node matching "*">
                    <RegexNode called "ws" matching " ">
                <Node called "prod" matching "30">
                    <Node called "number" matching "30">
                        <Node matching "30">
                            <RegexNode called "ws" matching "">
                            <RegexNode matching "30">
                            <RegexNode called "ws" matching "">
Tarik
  • 10,810
  • 2
  • 26
  • 40
  • I don't think you can ignore them with the parser, but you should be able to ignore them when you write your `Visitor` class. – Josh Voigts Jan 10 '22 at 15:29
  • Just curious -- why not use Lark for it? – Erez Jan 15 '22 at 22:04
  • 1
    @Erez That's what I did. I found Lark much easier to use and dropped parsimonious. I was still curious to find out if there was a way to achieve the same effect than Lark in parsimonious. – Tarik Jan 16 '22 at 11:42

0 Answers0