I've searched high and low to try and work out how to batch process pandoc.
How do I convert a folder and nested folders containing html files to markdown?
I'm using os x 10.6.8
I've searched high and low to try and work out how to batch process pandoc.
How do I convert a folder and nested folders containing html files to markdown?
I'm using os x 10.6.8
You can apply any command across the files in a directory tree using find
:
find . -name \*.md -type f -exec pandoc -o {}.txt {} \;
would run pandoc
on all files with a .md
suffix, creating a file with a .md.txt
suffix. (You will need a wrapper script if you want to get a .txt
suffix without the .md
, or do ugly things with subshell invocations.) {}
in any word from -exec
to the terminating \;
will be replaced by the filename.
I made a bash script that would not work recursively, perhaps you could adapt it to your needs:
#!/bin/bash
newFileSuffix=md # we will make all files into .md
for file in $(ls ~/Sites/filesToMd );
do
filename=${file%.html} # remove suffix
newname=$filename.$newFileSuffix # make the new filename
# echo "$newname" # uncomment this line to test for your directory, before you break things
pandoc ~/Sites/filesToMd/$file -o $newname # perform pandoc operation on the file,
# --output to newname
done
# pandoc Catharsis.html -o test
This builds upon the answer by geekosaur to avoid the .old.new
extension and use just .new
instead. Note that it runs silently, displaying no progress.
find -type f -name '*.docx' -exec bash -c 'pandoc -f docx -t gfm "$1" -o "${1%.docx}".md' - '{}' \;
After the conversion, when you're ready to delete the original format:
find -type f -name '*.docx' -delete