0

I am in the process of creating a simple parsing library in Haskell, which compiles down the parser specification to optimized code using Template Haskell.

However, I am trying to figure out what kind of code is the most efficient to optimize to, so let us consider a simple parserly-written parser snippet.

A parser can be seen as a function String -> (a, String) where a is whatever we want to recognize during parsing, and the resulting string is whatever part of the input we have not matched. When parsers are run in sequence, the next parser should continue working on this string. At the end we can say a full string was successfully parsed if the end result is of the shape ("", _).

In this example, we'll use a parser that parses any number of 'a'-characters. (Error handling, that is reporting an error if any character other than an 'a' was seen, has been left out of the code snippets to keep everything concise for the sake of question clarity).

Straightforward (naive?) implementation

If we write the code for a recursive parser parserly, a straightforward hand-rolled implementation might be:

parser_raw :: String -> (String, String)
parser_raw str =
  case str of
    [] -> ([], [])
    ('a' : rest) ->
      let
        (res', rest') = parser_raw rest
      in
        (('a' : res'), rest')

This matches any number of 'a''s. The parser recurses until the string is empty.

Looking at the created GHC Core, we see the following output:

Rec {
-- RHS size: {terms: 36, types: 45, coercions: 0, joins: 0/1}
$wparser_raw
  = \ w_s7BO ->
      case w_s7BO of {
        [] -> (# [], [] #);
        : ds_d736 rest_a5S6 ->
          case ds_d736 of { C# ds1_d737 ->
          case ds1_d737 of {
            __DEFAULT -> case lvl6_r7Fw of wild2_00 { };
            'a'# ->
              let {
                ds3_s7vv
                  = case $wparser_raw rest_a5S6 of { (# ww1_s7C3, ww2_s7C4 #) ->
                    (ww1_s7C3, ww2_s7C4)
                    } } in
              (# : lvl2_r7Fs
                   (case ds3_s7vv of { (res'_a65m, rest'_a65n) -> res'_a65m }),
                 case ds3_s7vv of { (res'_a65m, rest'_a65n) -> rest'_a65n } #)
          }
          }
      }
end Rec }

To my untrained eye, it seems like this is not tail recursive, as we first do the recursive call and then still need to combine the output into a tuple afterwards. I also find it odd/interesting that the let in the source code still is there in the compiled code, which makes it seem like the computation really needs to be done in multiple (i.e. non-tail-recursive) steps.

Two other approaches to write this code come to mind. (Maybe there are more?)

continuation-passing style

parser_cps :: String -> (String, String)
parser_cps str = parser_cps' str id
  where
    parser_cps' :: String -> ((String, String) -> (String, String)) -> (String, String)
    parser_cps' str fun =
      case str of
        [] -> fun ([], [])
        ('a' : rest) ->
          parser_cps' rest ((\(tl, final) -> (('a' : tl), final) ) . fun)

This results in:

Rec {
-- RHS size: {terms: 28, types: 29, coercions: 0, joins: 0/0}
parser_cps_parser_cps'
  = \ str_a5S8 fun_a5S9 ->
      case str_a5S8 of {
        [] -> fun_a5S9 lvl9_r7FM;
        : ds_d72V rest_a5Sa ->
          case ds_d72V of { C# ds1_d72W ->
          case ds1_d72W of {
            __DEFAULT -> lvl8_r7FL;
            'a'# ->
              parser_cps_parser_cps'
                rest_a5Sa
                (\ x_i72P ->
                   case fun_a5S9 x_i72P of { (tl_a5Sb, final_a5Sc) ->
                   (: lvl2_r7FF tl_a5Sb, final_a5Sc)
                   })
          }
          }
      }
end Rec }

-- RHS size: {terms: 4, types: 4, coercions: 0, joins: 0/0}
parser_cps = \ str_a5S6 -> parser_cps_parser_cps' str_a5S6 id

This looks tail-recursive, in the sense that we perform a tail-call. It does seem like the result is built up as a large thunk however. Are we using a lot of extra heap space until the base of the recursion is reached?

the State monad

-- Using `mtl`'s
-- import Control.Monad.State.Strict

parser_state :: String -> (String, String)
parser_state = runState parser_state'
  where
    parser_state' :: State String String
    parser_state' = do
      str <- get
      case str of
        [] -> return []
        ('a' : rest) -> do
          put rest
          res <- parser_state'
          return ('a' : res)

This results in:

Rec {
-- RHS size: {terms: 26, types: 36, coercions: 0, joins: 0/0}
$wparser_state'
  = \ w_s7Ca ->
      case w_s7Ca of {
        [] -> (# [], [] #);
        : ds_d72p rest_a5Vb ->
          case ds_d72p of { C# ds1_d72q ->
          case ds1_d72q of {
            __DEFAULT -> case lvl11_r7FO of wild2_00 { };
            'a'# ->
              case $wparser_state' rest_a5Vb of { (# ww1_s7Cl, ww2_s7Cm #) ->
              (# : lvl2_r7FF ww1_s7Cl, ww2_s7Cm #)
              }
          }
          }
      }
end Rec }

-- RHS size: {terms: 8, types: 14, coercions: 6, joins: 0/0}
parser_state1
  = \ w_s7Ca ->
      case $wparser_state' w_s7Ca of { (# ww1_s7Cl, ww2_s7Cm #) ->
      (ww1_s7Cl, ww2_s7Cm) `cast` <Co:6>
      }

-- RHS size: {terms: 1, types: 0, coercions: 7, joins: 0/0}
parser_state = parser_state1 `cast` <Co:7>

Here, I'm not sure how to interpret the core: The let of the parser_raw-code is gone. Instead, the recursive call ends up immediately in the case. Afterwards we still have to put the result in a tuple, but is this "sans cons" enough to please the recursion gods?


So, to summarize: these are three techniques to write a simple parsing function. I would like to know which one of these is the most memory-efficient and how to get some intuition in recognizing tail-recursion from looking at GHC Core output.

Qqwy
  • 5,214
  • 5
  • 42
  • 83
  • This can't be tail-recursive, because you need to do extra processing with the recursive return value before returning it up the chain. In order to achieve tail recursion, you have to rephrase the algorithm in such a way that the result of each step is the result of next step with no alterations. But even then, you still need to remember the extra processing bits _somewhere_ until you get the recursive result. Which is why you get a series of thunks build up in the second approach. – Fyodor Soikin Aug 17 '21 at 12:42
  • 4
    But also, keep in mind that, due to normal evaluation order, in Haskell tail recursion doesn't play as big of a role as in applicative order languages (aka every other language). On the contrary, tail recursion may often be harmful in Haskell rather than helpful. – Fyodor Soikin Aug 17 '21 at 12:44
  • Thank you! What techniques should I use to get more information to compare these then? Microbenchmarking? Is there anything useful to be gleamed from looking at the GHC core, or is it treacherous to optimize based on that? – Qqwy Aug 17 '21 at 13:18
  • 1
    If you're trying to optimize for performance, then benchmarking for sure. But also, consider why are you trying in the first place. Have you discovered existing solutions not to be performant enough? – Fyodor Soikin Aug 17 '21 at 13:28
  • 1
    Note that in your first example, the compiler *must* leave the let in the code. Removing it (changing it to a case) would change the semantics wrt laziness. If you don't want laziness you need to write code that is explicitly not lazy - use case instead of let. – Carl Aug 17 '21 at 15:49
  • 1
    See also "guarded recursion", which is a property analogous to tail recursion. It's often useful to ensure that your lazy recursive algorithms are guarded, just as it's desirable for a strict recursive algorithm to be tail recursive. – amalloy Aug 17 '21 at 16:57
  • @Carl Thank you very much! You are right! Contrary to my earlier intuition (I was not aware that `let` was more lazy than `case`), it seems only `parser_raw` will work correctly on infinite (or any 'too large to fit in memory at once') streams, exactly because of the extra laziness introduced by the `let`. (Ex: `(head_of_result : _, _) = parser_raw (repeat 'a'); head_of_result`). So I guess that for my purposes, that version is the right one to use. – Qqwy Aug 17 '21 at 19:09
  • ... and switching from `Control.Monad.State.Strict` to `Control.Monad.State.Lazy` allows that version to work the same way (resulting in identical Core output) as well. – Qqwy Aug 17 '21 at 19:22

0 Answers0