22

How do I convert RTF (say from stdin) to Markdown with a command line tool under UNIX/OSX.

I am looking for something like pandoc. However pandoc itself does not allow RTF as an input format. :-( So, I'd be happy either with a similar tool to pandoc or a pointer to an external RTF reader for pandoc.

halloleo
  • 9,216
  • 13
  • 64
  • 122

3 Answers3

21

On Mac OSX I can use the pre-installed textutil command for the RTF-to-HTML conversion, then convert via pandoc to markdown. So a command line which takes RTF from stdin and writes markdown to stdout looks like this:

textutil -stdin -convert html  -stdout | pandoc --from=html --to=markdown
halloleo
  • 9,216
  • 13
  • 64
  • 122
  • 2
    This works terribly in my experience. `textutil` preserves none of my formatting and links, and the HTML is littered with useless classes. – zool Oct 05 '18 at 17:41
  • @zool You can avoid (or at least significantly minimise the "class litter" by switching off some Pandoc extensions. I switch off `native_divs`, `native_spans`, `fenced_divs`, `header_attributes`, `auto_identifiers`, `inline_code_attributes`, `link_attributes` and `raw_attribute`. HTH, Leo – halloleo Oct 07 '18 at 11:08
  • I tried this script. The links in the clipboard are all stripped off. – Martin Oct 10 '21 at 18:12
  • @Martin The current version of pandoc seems to support RTF as an input format. Maybe try that. It should be better at preserving links. If it works, post it as an answer here please. – halloleo Oct 10 '21 at 23:39
  • I think the problem is not with pandoc but with textutil. I've found a [script](https://gist.github.com/rolandcrosby/c26571bf4e263f695d2f) that works (with minor changes). ```if encoded=`osascript -e 'the clipboard as «class HTML»'` 2>/dev/null; \ then echo $encoded \ | perl -ne 'print chr foreach unpack("C*",pack("H*",substr($_,11,-3)))' \ | pandoc --wrap=none -f HTML -t markdown; else pbpaste; fi```. To be host, I don't understand the code. Maybe `<>` makes a difference. I changed to `RTF` and the links are tripped off. – Martin Oct 11 '21 at 01:31
  • @Martin Sorry, I wasn't clear: I'meant you could try using pandoc _instead_ of textutil to read the RTF file. Anyway, I'm glad you found another solution. – halloleo Oct 11 '21 at 05:11
  • @halloleo. Thanks. I did try pandoc before. The problem is I need to convert the text from the clipboard. padoc will simply strip off the links. – Martin Oct 11 '21 at 14:54
6

Using Ted and pandoc together, you should be able to do this:

Ted --saveTo text.rtf text.html
pandoc --from=html --to=markdown --out=text.md < text.html
miken32
  • 42,008
  • 16
  • 111
  • 154
  • Converting _rtl_ to _html_ can also easily be done with Apple's command **textutil** (see _man textutil_) And have a look at (http://stackoverflow.com/questions/1043768/quickly-convert-rtf-doc-files-to-markdown-syntax-with-php) – Heinrich Giesen May 26 '15 at 09:25
  • @HeinrichGiesen Ups, didn't see your comment! Yes, that's what I found out as well: On OSX `textutil` is the way to go! – halloleo May 27 '15 at 03:51
  • 1
    That sounds like the best answer for OS X; your question said you were looking for a cross platform solution so I didn't consider it. Glad you figured something out. – miken32 May 27 '15 at 03:54
  • 1
    Ted 2.23 deb pkg is not installable on Debian 8.11, not even by dpkg command. – pimgeek Sep 05 '18 at 05:34
  • 1
    @pimgeek use the source – miken32 Sep 05 '18 at 12:49
  • Also no need to downvote a perfectly good answer just because you can't figure out how to get software installed. – miken32 Sep 05 '18 at 17:00
  • even if I compile from source, on my debian machine, the dependency does not satisfy ... I should be more explicit in my previous comment. :-( – pimgeek Sep 10 '18 at 01:05
  • @miken32 I noticed that my linux apt source.list had some issues, after fixing that I will try Ted again. I should not downvote your answer while being lazy to double-check. :-) – pimgeek Sep 14 '18 at 09:00
  • After I fixed the apt source.list, still get the following error, the dependency problem still persists... `E: Package 'libjpeg8' has no installation candidate` `E: Package 'libtiff4' has no installation candidate` – pimgeek Sep 14 '18 at 10:17
4

Pandoc now supports RTF as an input format, so you can use:

cat file.rtf | pandoc --from=rtf --to=markdown
shreyasm-dev
  • 2,711
  • 5
  • 16
  • 34