Question
I know Parsec
and uu-parsinglib
and I've written parsers in both of them. Recently I discovered, that there is a problem in uu-parsinglib
, which could significantly affect its performance and I do not see a way to solve it.
Lets consider following Parsec parsers:
pa = char 'a'
pb = char 'b'
pTest = many $ try(pa <* pb)
What would be the equivalent in uu-parsinglib
? It would not be the following:
pa = pSym 'a'
pb = pSym 'b'
pTest = pList_ng (pa <* pb)
The difference is, that in Parsec
, many
would eat (pa <* pb)
(pairs of "ab"
) greedy until it is not matched anymore, while in uu-parsinglib
, pList_ng
is not greedy, so it will keep in memory possible backtrack ways after parsing each (pa <* pb)
.
Is there any way to write something like pList(try(pa <* pb))
in uu-parsinglib
?
Example
A good example would be
pExample = pTest <* (pa <* pb)
and a sample input of "ababab"
.
With Parsec
, we would get error (because pTest
is greedy parsing pairs of "ab"
), but in uu-parsinglib
, the string would be parsed with no problems.
Edit
We cannot switch from pList_ng
to pList
, because it would be not equivalent to Parsec
version. For example:
pExample2 = pTest <* pa
and a sample input of "ababa"
would success in Parsec
, but fail in uu-parsinglib
, when using greedy pList
.
Of course uu-parsinglib
will succeed if we use pList_ng
here, but it could be a lot slower for some inputs and rules. For example, considering the input "ababab"
, Parsec
would simply fail, because pTest
would consume whole string and pa
would not be matched. uu-parsinglib
will fail also, but checking a more steps - it will match whole string and fail, then throw away last "ab"
pair and fail again, etc. If we have some nested rules and funny input text, it will make a huge difference.
A little benchmark
To prove, that the problem exists in real, consider following grammar (in a pseudocode - but I think it is very intuitive):
pattern = many("ab") + "a"
input = many(pattern)
So as input to our program we get a string containing patterns, for example "abababaaba" contains 2 patterns "abababa" and "aba".
Lets make parsers in both libraries!
uu-parsinglib
:
import Data.Char
import qualified Text.ParserCombinators.UU as UU
import Text.ParserCombinators.UU hiding(parse)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
import System.TimeIt (timeIt)
pa = pSym 'a'
pb = pSym 'b'
pTest = pList $ pList_ng ((\x y -> [x,y]) <$> pa <*> pb) <* pa
main :: IO ()
main = do
timeIt maininner
return ()
maininner = do
let (x,e) = UU.parse ((,) <$> pTest <*> pEnd) (createStr (LineColPos 0 0 0) (concat $ replicate 1000 (concat (replicate 1000 "ab") ++ "a")))
print $ length x
Parsec
:
import Control.Applicative
import Text.Parsec hiding (many, optional, parse, (<|>))
import qualified Text.Parsec as Parsec
import System.TimeIt (timeIt)
pa = char 'a'
pb = char 'b'
pTest = many $ many (try ((\x y -> [x,y]) <$> pa <*> pb)) <* pa
main :: IO ()
main = do
timeIt maininner2
return ()
maininner2 = do
let Right x = Parsec.runParser pTest (0::Int) "test" $ (concat $ replicate 1000 (concat (replicate 1000 "ab") ++ "a"))
print $ length x
Result? uu-parsinglib
is > 300% slower:
uu-parsinglib - 3.19s
Parsec - 1.04s
(compiled with ghc -O3
flag)