6

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:

Consume a line - if its indentation is:

  • ok -> do it for next lines
  • wrong -> fail

Lets consider following code:

import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)

-- end of line
pEOL   = pSym '\n'

pSpace = pSym ' '
pTab   = pSym '\t'

indentOf s = case s of
    ' '  -> 1
    '\t' -> 4

-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0

-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL

-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
    (lind, expr) <- pIndentLine p
    if lind < i
        then pFail
        else do
            rest <- myParse p i `opt` []
            return $ expr:rest

-- sample inputs
s1 = " a\
   \\n a\
   \\n"

s2 = " a\
   \\na\
   \\n"

-- execution
pProgram = myParse (pSym 'a') 1 

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    print $ parse pProgram s1
    print $ parse pProgram s2
    return ()

Which gives following output:

("aa",[])
Test.hs: no correcting alternative found

The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?

Wojciech Danilo
  • 11,573
  • 17
  • 66
  • 132

1 Answers1

1

The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.

In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.

Maybe you do not want it to fail, but you want to stop accepting further input? In that case replace pFail by return [].

Note that the text:

do
    rest <- myParse p i `opt` []
    return $ expr:rest

can be replaced by (expr:) <$> (myParse p i `opt` [])

A natural way to solve your problem is probably something like

pIndented p = do i <- pGetIndent
             (:) <$> p <* pEOL  <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)

pIndent = length <$> pMany (pSym ' ')
rkrzr
  • 1,842
  • 20
  • 31
  • Thank you, but it does not yet completely solve my problem (I've updated the code in question) - what if the indentations could be spaces or tabs, where tabs are 4 spaces? – Wojciech Danilo Aug 15 '13 at 13:05
  • additional `return []` does **not** work as we want - if we replace `pFail` with `return []` the second "a" will be consumed by the parser (it will be consumed and `[]` will be returned) - I do not want the second "a" in the example s2 to be consumed. – Wojciech Danilo Aug 15 '13 at 13:54
  • If you want to properly handle tabs (and not replace tabs by just four spaces) you will have to program your own small finite state machine which "knows" what a tab stands for. – Doaitse Swierstra Aug 17 '13 at 19:28
  • Could you please tell me more about how such state machine could be used with uu-parsinglib? Additional, you have told, that "`pFail` acts as an unit element for `<|>`" - so (maybe I'm wrong) it still **should work**: please notice - it is used in the expression `myParse p i \`opt\` []` - so if `myParse` fails, then `opt` should return `[]`, shouldnt it? – Wojciech Danilo Aug 17 '13 at 23:20