Consider the following simple grammar of a CSV document (In ABNF):
csv = *crow
crow = *(ccell ',') ccell CR
ccell = "'" *(ALPHA / DIGIT) "'"
We want to write a converter that converts this grammar into a TSV (tabulator separated values) document:
tsv = *trow
trow = *(tcell HTAB) tcell CR
tcell = DQUOTE *(ALPHA / DIGIT) DQUOTE
First of all, let's create an algebraic data type that descibes our abstract syntax tree. Type synonyms are included to ease understandment:
data XSV = [Row]
type Row = [Cell]
type Cell = String
Writing a parser for this grammar is pretty simple. We write a parser as if we would describe the ABNF:
csv :: Parser XSV
csv = XSV <$> many crow
crow :: Parser Row
crow = do cells <- ccell `sepBy` (char ',')
newline
return cells
ccell :: Parser Cell
ccell = do char '\''
content <- many (digit <|> letter)
char '\''
return content
This parser uses do
-notation. After a do
, a sequence of statements follows. For parsers, these statements are simply other parsers. One can use <-
to bind the result of a parser. This way, one builds a big parser by chaining multiple smaller parsers. To obtain interesting effects, one can also combine parser using special combinators (such as a <|> b
, which parses either a
or b
or many a
, which parses as many a
s as possible). Please be aware that Parsec does not backtrack by default. If a parser might fail after consuming characters, prefix it with try
to enable backtracking for one instance. try
slows down parsing.
The result is a parser csv
that parses our CSV document into an abstract syntax tree. Now it is easy to turn that into another language (such as TSV):
xsvToTSV :: XSV -> String
xsvToTSV xst = unlines (map toLines xst) where
toLines = intersperse '\t'
Connecting these two things one gets a conversion function:
csvToTSV :: String -> Maybe String
csvToTSV document = case parse csv "" document of
Left _ -> Nothing
Right xsv -> xsvToTSV xsv
And that is all! Parsec has lots of other functions to build up extremely sophisticated parsers. The book Real World Haskell has a nice chapter about parsers, but it's a little bit outdated. Most of that is still true, though. If you have further questions, feel free to ask.