11

I've searched high and low to try and work out how to batch process pandoc.

How do I convert a folder and nested folders containing html files to markdown?

I'm using os x 10.6.8

Asclepius
  • 57,944
  • 17
  • 167
  • 143
rev
  • 623
  • 1
  • 7
  • 16

3 Answers3

24

You can apply any command across the files in a directory tree using find:

find . -name \*.md -type f -exec pandoc -o {}.txt {} \;

would run pandoc on all files with a .md suffix, creating a file with a .md.txt suffix. (You will need a wrapper script if you want to get a .txt suffix without the .md, or do ugly things with subshell invocations.) {} in any word from -exec to the terminating \; will be replaced by the filename.

geekosaur
  • 59,309
  • 11
  • 123
  • 114
  • many thanks. I assume I can replace `\*.md` and `.txt` with different extensions to convert the desired files. ie `\*.html` and `.md`? – rev Apr 25 '12 at 21:06
  • Yes, that's why I detailed what was going on, so you could see more easily which parts to change to do what you need. – geekosaur Apr 25 '12 at 21:12
  • I don't quite understand the following: "`{}` in any word from `-exec` to the terminating `\;` will be replaced by the filename." – rev Apr 25 '12 at 21:35
  • If you look at the sample command I provided, I used the character sequence `{}` twice: once to specify the input file, and once the output (with `.txt` appended). `find` replaces all instances of `{}` in an `-exec` command with the current filename. – geekosaur Apr 25 '12 at 21:37
  • Great answer. Just adapted this to batch convert .docx to .md with Markdown ATX headers and no line wrap: `find . -name \*.docx -type f -exec pandoc -o {}.md {} --wrap=none --atx-headers \;` – Dave Everitt Dec 31 '18 at 17:22
2

I made a bash script that would not work recursively, perhaps you could adapt it to your needs:

#!/bin/bash    
newFileSuffix=md # we will make all files into .md

for file in $(ls ~/Sites/filesToMd );
do
  filename=${file%.html} # remove suffix
  newname=$filename.$newFileSuffix # make the new filename
#  echo "$newname" # uncomment this line to test for your directory, before you break things
   pandoc ~/Sites/filesToMd/$file -o $newname # perform pandoc operation on the file,
                                                     # --output to newname


done
# pandoc Catharsis.html -o test
lazaruslarue
  • 318
  • 2
  • 14
-1

This builds upon the answer by geekosaur to avoid the .old.new extension and use just .new instead. Note that it runs silently, displaying no progress.

find -type f -name '*.docx' -exec bash -c 'pandoc -f docx -t gfm "$1" -o "${1%.docx}".md' - '{}' \;

After the conversion, when you're ready to delete the original format:

find -type f -name '*.docx' -delete
Asclepius
  • 57,944
  • 17
  • 167
  • 143