2

I use (abuse) parsers to do some string transformation e.g. normalizeWS :: Parser String removes duplicate whitespace and normalizeCase maps specific strings to lower case. I use parsers because the input data has some structure for example literate strings have to be left untransformed. Is there an elegant way to feed the output of one parser as input to the next and thus form a transformation pipeline? Something in the vein of normalizeWS . normalizeCase (which of course doesnt work)?

Many thanks in advance!

jules
  • 1,897
  • 2
  • 16
  • 19
  • I don't think you can compose `Parser`s that way since they would both be reading from the underlying stream. I think you may be better defining each as `String -> String`, and when you have a `Parser String` you'd like to normalize you could `fmap (normalizeWS . normalizeCase)`. – ryachza Oct 19 '17 at 18:30

2 Answers2

1

I solved the problem using this approach ... maybe there is a more elegant way

preprocessor :: Parser String
preprocessor = normalizeCase `feeds` expandKettensatz `feeds` normalizeWs

feeds :: Parser String -> Parser String -> Parser String
feeds p1 p2 = do
  s <- p1
  setInput s
  p2
jules
  • 1,897
  • 2
  • 16
  • 19
  • This will not compose well: `feeds p1 p2 >> p3` does not behave like "consume what `p1` consumes, pass the result of that consumption to `p2`, then parse the remaining unconsumed input as `p3` would". And it is not easy to fix it to work that way, either, sadly; certainly the most obvious `getInput`/`setInput` gymnastics for fixing this particular case just makes the bug more subtle rather than actually working in all cases. – Daniel Wagner Oct 19 '17 at 20:53
0

If you have functions like

normalizeWhitespace :: Stream s m Char => ParsecT s u m String
normalizeCase :: Stream s m Char => Set String -> Parsec s u m String

You could chain them together using runParser and >>=:

runBoth :: Stream s Identity Char => Set String -> SourceName -> s -> Either ParseError String
runBoth wordSet src input = do
  input <- runParser normalizeWhitespace () src input
  runParser (normalizeCase wordSet) () src input

But this doesn't give you a parser that you can chain together with other parsers.

This isn't terribly surprising, as parser composition in Parsec is all about composing parsers that operate on the same stream, whereas these operate on different streams.

Having multiple different streams is pretty common too - using the output of a tokenization or lexing pass as input to parsing can make the process easier to understand, but Parsec is a little easier to use out of the box as a direct parser (without lexing/tokenization).

rampion
  • 87,131
  • 49
  • 199
  • 315