12

Mixing the lexer and parsing phases in one phase sometimes makes Parsec parsers less readable but also slows them down. One solution is to use Alex as a tokenizer and then Parsec as a parser of the token stream.

This is fine but it would be even better if I could get rid of Alex because it adds one preprocessing phase in the compilation pipeline, doesn't integrate well with haskell "IDEs", etc. I was wondering if there was such a thing as an haskell EDSL for describing tokenizers, very much in the style of Alex, but as a library.

Paul Brauner
  • 1,307
  • 1
  • 10
  • 17
  • This is a question that I have been looking into as of late but there have been nothing I've really seen. I'm imagining maybe a RegEx EDSL from which we make an untagged tokenizer (:: [RegEx] -> String -> [String]). – Jason Reich Oct 13 '11 at 09:22
  • I could come up with a quick solution using any regexp library by trying to match the current string agains each regexp, but I would lose a lot of Alex' optimizations due to its knowledge of the set of all regexps. – Paul Brauner Oct 13 '11 at 09:39

2 Answers2

4

Yes - http://www.cse.unsw.edu.au/~chak/papers/Cha99.html

Before Hackage, Manuel used to release the code in a package called CTK (compiler toolkit). I'm not sure what the status of project is these days.

I think Thomas Hallgren's lexer from the paper "Lexing Haskell in Haskell" was dynamic rather than a code generator, whilst the release is tailored to lexing Haskell the machinery in the library is more general. Iavor Diatchki has put the code on Hackage.

http://hackage.haskell.org/package/haskell-lexer

stephen tetley
  • 4,465
  • 16
  • 18
3

You can use Parsec as the lexer too. First you parse the string into tokens, then you parse the tokens into the target data type.

Sjoerd Visscher
  • 11,840
  • 2
  • 47
  • 59
  • True but then again you lose the speed of minimal DFAs that you could get with a tool like Alex without losing any expressiveness (I prefer Parsec over, say, Yacc because it offers better modularity/expressiveness, but I'm not convinced this is very useful for lexers). But at least, it solves the problem of mixing the two phases. Thanks. – Paul Brauner Oct 13 '11 at 16:14