3

This seems like a popular question. I am not familiar with Haskell at all, so by reading the answers to the similar questions I can't really understand what needs to be done in my case.

My script looks like this:

import Text.Pandoc.JSON

pagebreakXml :: String
pagebreakXml = "<w:p><w:r><w:br w:type=\"page\"/></w:r></w:p>"

pagebreakBlock :: Block
pagebreakBlock = RawBlock (Format "openxml") pagebreakXml

blockSwapper :: Block -> Block
blockSwapper (Para [Str "<div class=\"docxPageBreak\"></div>"])  = pagebreakBlock
blockSwapper blk = blk

main = toJSONFilter blockSwapper

Compiling it results in these errors:

$ ghc --make docx-page-filter.hs -package-db=/Users/eugene/.cabal/store/ghc-8.8.3/package.db
[1 of 1] Compiling Main             ( docx-page-filter.hs, docx-page-filter.o )

docx-page-filter.hs:7:35: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the first argument of ‘Format’, namely ‘"openxml"’
      In the first argument of ‘RawBlock’, namely ‘(Format "openxml")’
      In the expression: RawBlock (Format "openxml") pagebreakXml
  |
7 | pagebreakBlock = RawBlock (Format "openxml") pagebreakXml
  |                                   ^^^^^^^^^

docx-page-filter.hs:7:46: error:
    • Couldn't match type ‘[Char]’ with ‘Data.Text.Internal.Text’
      Expected type: Data.Text.Internal.Text
        Actual type: String
    • In the second argument of ‘RawBlock’, namely ‘pagebreakXml’
      In the expression: RawBlock (Format "openxml") pagebreakXml
      In an equation for ‘pagebreakBlock’:
          pagebreakBlock = RawBlock (Format "openxml") pagebreakXml
  |
7 | pagebreakBlock = RawBlock (Format "openxml") pagebreakXml
  |                                              ^^^^^^^^^^^^

docx-page-filter.hs:10:25: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the pattern: "<div class=\"docxPageBreak\"></div>"
      In the pattern: Str "<div class=\"docxPageBreak\"></div>"
      In the pattern: [Str "<div class=\"docxPageBreak\"></div>"]
   |
10 | blockSwapper (Para [Str "<div class=\"docxPageBreak\"></div>"])  = pagebreakBlock
   |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
$

At first, I went with an accepted answer suggested here: Couldn't match expected type `Text' with actual type `[Char]'. It helped reduce the number of errors to a single one. Compiling the script now results in this error that I don't know how to fix:

$ ghc --make docx-page-filter.hs -package-db=/Users/eugene/.cabal/store/ghc-8.8.3/package.db -XOverloadedStrings
[1 of 1] Compiling Main             ( docx-page-filter.hs, docx-page-filter.o )

docx-page-filter.hs:7:46: error:
    • Couldn't match type ‘[Char]’ with ‘Data.Text.Internal.Text’
      Expected type: Data.Text.Internal.Text
        Actual type: String
    • In the second argument of ‘RawBlock’, namely ‘pagebreakXml’
      In the expression: RawBlock (Format "openxml") pagebreakXml
      In an equation for ‘pagebreakBlock’:
          pagebreakBlock = RawBlock (Format "openxml") pagebreakXml
  |
7 | pagebreakBlock = RawBlock (Format "openxml") pagebreakXml
  |                                              ^^^^^^^^^^^^
$

What should I change about my code to fix this issue?

oldhomemovie
  • 14,621
  • 13
  • 64
  • 99
  • 1
    Which version of the compiler are you using ? Using ghci v8.6.5, I get this as the type signature for RawBlock: `RawBlock :: Format -> String -> Block` and your code works out of the box. In any case, you can use function [pack](https://hackage.haskell.org/package/text-1.2.4.0/docs/Data-Text.html#v:pack) to convert from String to Text. – jpmarinier Jul 01 '20 at 14:58
  • 1
    @jpmarinier pandoc finally made the switch from String to Text in version 2.8. You probably have an older version installed. – tarleb Jul 01 '20 at 17:01
  • More of a side comment: instead of using a Haskell filter, one could also adjust [this](https://github.com/pandoc/lua-filters/tree/master/pagebreak) Lua filter. This would avoid some data serialization and could be a bit faster. – tarleb Jul 01 '20 at 17:03
  • @tarleb Correct, running Pandoc 2.5 as provided by Linux Fedora v31. So apparently they are not above breaking backward compatibility. – jpmarinier Jul 01 '20 at 18:23
  • @tarleb as I understand, that Lua filter works only on Latex files. I made a couple of attempts to adjust it, but it just won't accept HTML as an input. Any idea why? – oldhomemovie Jul 01 '20 at 18:28
  • 1
    @gmile That's true, it's mostly aimed at Markdown input (which may include raw LaTeX). I'm happy to see from your post to the pandoc issue that you found a way to adapt it to you requirements. – tarleb Jul 01 '20 at 20:18

1 Answers1

4

In Haskell there are multiple types of strings in common usage. By default, string literals ("stuff in quotes") are of type String (which is [Char]), but the library you're using expects values of type Text.

  • Enable the OverloadedStrings extension so that string literals can also have type Text

  • Change the type signature(s) from String to Text; Text can be imported from the Data.Text module in the text library (it's also worth mentioning that there are two Text, the other one from Data.Text.Lazy, and that could be another source of mismatch for you in the future).

{-# LANGUAGE OverloadedStrings #-}   -- Add at the top of the file

...  -- imports
import Data.Text (Text)              -- Import the Text type

pagebreakXml :: Text                 -- from String to Text
pagebreakXml = "<w:p>..."
Li-yao Xia
  • 31,896
  • 2
  • 33
  • 56