3

I'm writing a compiler using uu-parsinglib and I saw a very strange thing. I defined a pChoice combinator like:

pChoice    = foldr (<<|>) pFail

(notice, I'm using greedy <<|>).

Lets consider following code:

pFactor i = pChoice [ Expr.Var    <$> pVar
                    , Expr.Lit    <$> pLit True
                    , L.pParensed (pExpr i)
                    -- , Expr.Tuple  <$> pTuple (pOpE i)
                    -- , Expr.List   <$> pLst   (pListE i)
                    ]

Each element starts with different character - Expr.Var starts with a letter, Expr.Lit with a number, L.pParensed with parenthesis (, Expr.Tuple with brace { and Expr.List with bracket [.

I've got a big test code in which there are no tuples and no lists. The code parses in 0.15s. When I uncomment the above lines, the time increases to 0.65s. This is over 400% slowdown... How is it possible? I'm using only greedy operators and I'm sure parser is not haning in Tuple nor List section, because in the whole code there is no tuple nor list.

If you would need more code or definitions, I'll of course poste it.

Wojciech Danilo
  • 11,573
  • 17
  • 66
  • 132
  • 1
    Might be worth cutting this down to a minimal complete example that demonstrates the slowdown - i.e. one small program that's fast and one with minimal changes from that program that's slow. – Ganesh Sittampalam Dec 31 '13 at 06:11
  • 1
    I am extrapolating *a lot* here, but 400% really looks like exponential growth in terms of the # of items in the pChoice (ie- each item doubles the number of possibilities). According to your description, exponential growth shouldn't happen, but then again, there is some bug, so it would be good to rule this out. Can you add another ostensibly k=1 resolvable option, and see if the slowdown goes to 800%, or comment out one and see if it is 200%? Exponential growth is a pretty common bug in parsers if you accidentally add choices that don't resolve right away. – jamshidh Dec 31 '13 at 06:53
  • For things like expression grammars, it's common to have nested productions (Parens, Tuple, List...) before simple expressions (Literal, Variable...). Where nested productions have prefixes ( '(', '{', '[' ...) this means the parser can reject them quickly. [Obviously when you start adding operators you need to worry about left-factoring] – stephen tetley Dec 31 '13 at 08:52

1 Answers1

2

I think the cause of the matter may lie in the fact that you have parameterised pFactor. This will cause each call to such a parser to build a new parser, which will take time. It is much better to create such parsers once and for all and share them in the actual parsing process. I cannot see how you are using this parser I cannot answer your questions any further.