6

Say I have two languages (A & B). My goal is to write some type of program to convert the syntax found in A to the equivalent of B. Currently my solution has been to use Haskell's Parsec to perform this task. As someone who is new to Haskell and functional programming for that matter however, finding just a simple example in Parsec has been quite difficult. The examples I have found on the web are either incomplete examples (frustrating for a new Haskell programmer) or too much removed from my goal.

So can someone provide me with an amazingly trivial and explicit example of using Parsec for something related to what I'd like to achieve? Or perhaps even some tutorial that parallels my goal as well.

Thanks.

Vincent Russo
  • 1,087
  • 3
  • 20
  • 33
  • 1
    I feel your pain. you might want to investigate alex as well http://www.haskell.org/alex/doc/html/introduction.html if you don't want to mix your parsing and lexing – jberryman Jul 09 '12 at 19:38
  • 1
    Have a look at [Real world haskell](http://book.realworldhaskell.org/read/using-parsec.html) – fuz Jul 09 '12 at 19:42
  • 2
    People think that having an AST is enough to do *anything* really useful with most programming languages. This is simply false; you need a lot more machinery. A parser just doesn't do it. See my SO answer at http://stackoverflow.com/questions/3455456/how-to-translate-between-programming-languages/3460977#3460977. – Ira Baxter Jul 09 '12 at 20:29
  • 2
    @IraBaxter Of course, as you also observe, parsing *is* a prerequisite. It's nice to be warned about the spike pit just beyond the mustard gas, but you still might want help using a gas mask. – Daniel Wagner Jul 09 '12 at 21:56
  • 2
    I don't mind people getting a gas mask. I object to the naive idea that a gas mask is all somebody needs; nobody asks about the spike pit until they arrive, and then they pretty much act surprised. More importantly, after the surprise they naturally fall into the pit and die and nothing happens except a waste of time and precious resources. The problem is this lesson simply doesn't seem to be understood by the community. – Ira Baxter Jul 09 '12 at 22:05

1 Answers1

16

Consider the following simple grammar of a CSV document (In ABNF):

csv   = *crow
crow  = *(ccell ',') ccell CR
ccell = "'" *(ALPHA / DIGIT) "'"

We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:

tsv   = *trow
trow  = *(tcell HTAB) tcell CR
tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE

First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:

data XSV  = [Row]
type Row  = [Cell]
type Cell = String

Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:

csv :: Parser XSV
csv = XSV <$> many crow

crow :: Parser Row
crow = do cells <- ccell `sepBy` (char ',')
          newline
          return cells

ccell :: Parser Cell
ccell = do char '\''
           content <- many (digit <|> letter)
           char '\''
           return content

This parser uses do-notation. After a do, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <- to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b, which parses either a or b or many a, which parses as many as as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try to enable backtracking for one instance. try slows down parsing.

The result is a parser csv that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):

xsvToTSV :: XSV -> String
xsvToTSV xst = unlines (map toLines xst) where
  toLines = intersperse '\t'

Connecting these two things one gets a conversion function:

csvToTSV :: String -> Maybe String
csvToTSV document = case parse csv "" document of
  Left _    -> Nothing
  Right xsv -> xsvToTSV xsv

And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 2
    Please note that this code is untested. If it doesn't work, please blame me and be verbose about what doesn't work. – fuz Jul 09 '12 at 20:19
  • "cell" in the grammar of the CSV document has to be "ccell" (second line), hasn't it? Could you tell me what "CR" in the grammar stands for? Where does your XSV parser takes care of that parsing a row is applied to every new line in the document? Thanks in advance! – efie Jul 09 '12 at 20:54
  • CR is carriage return; in this case, it just denotes a newline. Newlines vary between operating systems. – identity Jul 09 '12 at 20:57
  • The parser `crow` parses a full row, including a newline. `csv` simply applies the `crow` parser as often as possible, till it fails or all input is consumed. The `<$>` just wraps up the result. (I do this to avoid do-notation which would also be possible) – fuz Jul 09 '12 at 22:25
  • Please see [Wikipedia](http://en.wikipedia.org/wiki/ABNF) for the definition of `CR`. It's one of the predefined rules in ABNF. – fuz Jul 09 '12 at 22:28