3

I'm trying to learn the basics of Haskell while developing a filter for Pandoc to recursively include additional markdown files.

Based on the scripting guide I was able to create a somewhat working filter. This looks for CodeBlocks with the include class and tries to include the ASTs of the referenced files.

```include
section-1.md
section-2.md
#pleasedontincludeme.md
```

The whole filter and the input sources could be found in the following repository: steindani/pandoc-include (or see below)

One could run pandoc with the filter and see the output in markdown format using the following command: pandoc -t json input.md | runhaskell IncludeFilter.hs | pandoc --from json --to markdown

I've noticed that the map function (at line 38) — although gets the list of files to include — only calls the function for the first element. And this is not the only strange behavior. The included file could also have an include block that is processed and the referenced file is included; but it won't go deeper, the include blocks of the last file are ignored.

Why does not the map function iterate over the whole list? Why does it stop after 2 levels of hierarchy?

Please note that I'm just starting to learn Haskell, I'm sure I made mistakes, but I'm happy to learn.

Thank you

Full source code:

module Text.Pandoc.Include where

import Control.Monad
import Data.List.Split

import Text.Pandoc.JSON
import Text.Pandoc
import Text.Pandoc.Error

stripPandoc :: Either PandocError Pandoc -> [Block]
stripPandoc p =
  case p of
    Left _ -> [Null]
    Right (Pandoc _ blocks) -> blocks

ioReadMarkdown :: String -> IO(Either PandocError Pandoc)
ioReadMarkdown content = return (readMarkdown def content)

getContent :: String -> IO [Block]
getContent file = do
  c <- readFile file
  p <- ioReadMarkdown c
  return (stripPandoc p)

doInclude :: Block -> IO [Block]
doInclude cb@(CodeBlock (_, classes, _) list) =
  if "include" `elem` classes
    then do
      files <- return $ wordsBy (=='\n') list
      contents <- return $ map getContent files
      result <- return $ msum contents
      result
    else
        return [cb]
doInclude x = return [x]

main :: IO ()
main = toJSONFilter doInclude
Dániel Stein
  • 417
  • 3
  • 16
  • Well, the premise and title of your question is false; the result of `map` will depend on the whole list. There are no exceptions to that. So I can assure you that *something else* is the problem. Maybe the lists that you think have multiple items only have one? – Luis Casillas Dec 01 '15 at 22:20
  • @LuisCasillas I've printed the content of the map and it was correct, 5 Strings. I've also printed the name of the file and the AST in the `getContent` function and it was called only once (then once again for the next include block). – Dániel Stein Dec 01 '15 at 22:25
  • The result of applying `map` to a list of 5 items is going to be a list of 5 items, period. Now, Haskell has lazy evaluation, so not all items in the result list may be computed for whatever reason, and this might produce an illusion like what you describe. But really, trust me when I tell you this: the problem isn't in `map`. You are either giving `map` a different list than what you think you are, or you are consuming its result in a way other than what you think. – Luis Casillas Dec 01 '15 at 22:32
  • @LuisCasillas I understand, thank you. I only tried to write down what I've noticed, it may be easier to find worded like this by someone else also learning the language. However I've tried replacing every `return $` and alike with `return $!` and it still does not evaulate eagerly. Do you have any advice? – Dániel Stein Dec 01 '15 at 22:44
  • That's understandable Dániel. – Luis Casillas Dec 01 '15 at 22:50
  • Daniel, most of the `return`s in `doInclude` are unnecessary. Any line that goes `var <- return $ expr` can be replaced with `let var = expr`. – Luis Casillas Dec 01 '15 at 22:51
  • Those `x <- return$ foo` perform no IO, they are equivalent to `let x = foo`. I guess your `msum contents` is supposed to concatenate the contents list, i.e. to work in the list monad, when instead is accidentally used in the IO monad "thanks" to overloading and typeclass instance selection. – chi Dec 01 '15 at 22:52
  • @chi I was about the point out the same thing, but I don't see where this `MonadPlus IO` instance would be coming from. – Luis Casillas Dec 01 '15 at 22:53
  • @LuisCasillas Thank you, I was aware of that, it was only temporary solution for debugging. I've replaced the whole do block with `msum $! map getContent (wordsBy (=='\n') list)`. – Dániel Stein Dec 01 '15 at 23:00
  • @chi I'd like the `[IO [Block]]` to be concatenated into `IO [Block]`. Isn't the `msum` function the right choice? – Dániel Stein Dec 01 '15 at 23:02
  • 2
    @DánielStein Try `sequence contents`. Should give you an `IO [[Block]]`. Or `fmap concat (sequence contents) :: IO [Block]` – Luis Casillas Dec 01 '15 at 23:04
  • @DánielStein: Also, read [this question and its top answer](http://stackoverflow.com/questions/4504489/monadplus-definition-for-haskell-io). – Luis Casillas Dec 01 '15 at 23:07
  • @LuisCasillas `sequence contents` unfortunately does not solve the problem, `Expected type: [IO Block] Actual type: [IO [Block]]` – Dániel Stein Dec 01 '15 at 23:08
  • @LuisCasillas `fmap concat (sequence contents)` works perfectly, thank you very much. (Could you please extend your answer with this?) Also thank you @chi. – Dániel Stein Dec 01 '15 at 23:11

1 Answers1

6

I can spot the following error in your doInclude function:

doInclude :: Block -> IO [Block]
doInclude cb@(CodeBlock (_, classes, _) list) =
  if "include" `elem` classes
    then do
      let files = wordsBy (=='\n') list
      let contents = map getContent files
      let result = msum contents            -- HERE
      result 
    else
        return [cb]
doInclude x = return [x]

Since the type of the result of this whole function is IO [Block], we can work backward:

  1. result has type IO [Block]
  2. contents has type [IO [Block]]
  3. msum is being used with type [IO [Block]] -> IO [Block]

And that third part is the problem—somehow in your program, there is a non-standard MonadPlus instance being loaded for IO, and I bet that what it does on msum contents is this:

  • Execute the first action
    • If that succeeds, produce the same result as that and discard the rest of the list. (This is the cause of the behavior you observe.)
    • If it fails with an exception, try the rest of the list.

This isn't a standard MonadPlus instance so it's coming from one of the libraries that you're importing. I don't know which.

A general recommendation here would be:

  1. Split your program into smaller functions
  2. Write type signatures for those functions

Because the problem here seems to be that msum is being used with a different type than the one you expect. Normally this would produce a type error, but here you got unlucky and it interacted with a strange type class instance in some library.


From the comments, your intent with msum contents was to create an IO action that executes all of the subactions in sequence, and collects their result as a list. Well, the MonadPlus class isn't normally defined for IO, and when it is it does something else. So the correct function to use here is sequence:

-- Simplified version, the real one is more general:
sequence :: Monad m => [m a] -> m [a]
sequence [] = return []
sequence (ma:mas) = do
  a <- ma
  as <- mas
  return (a:as)

That gets you from [IO [Block]] to IO [[Block]]. To eliminate the double nested lists then you just use fmap to apply concat inside IO.

Community
  • 1
  • 1
Luis Casillas
  • 29,802
  • 7
  • 49
  • 102