27

I am trying to convert an entire directory from html into markdown. The directory tree is quite tall, so there are files nested two and three levels down.

In answering this question, John MacFarlane suggested using the following Makefile:

TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))

.PHONY : all

all : $(MDS)

$(TXTDIR) :
    mkdir $(TXTDIR)

$(TXTDIR)/%.markdown : %.html $(TXTDIR)
    pandoc -f html -t markdown -s $< -o $@

Now, this doesn't seem to go inside subdirectories. Is there any easy way to modify this so that it will process the entire tree?

I don't need this to be in make. All I'm looking for is a way of getting a mirror of the initial directory where each html file is replaced by the output of running pandoc on that file.

(I suspect something along these lines should help, but I'm far from confident that I won't break things if I try to go at it on my own. I'm illiterate when it comes to GNU make).)

Community
  • 1
  • 1
apc
  • 1,497
  • 3
  • 13
  • 19
  • If you don't know `make`, maybe you just try to write your own script in your favourite language, e.g. Python or Ruby? (sorry to not be of more help right now) – mb21 Oct 02 '14 at 14:23
  • Yeah, I may just try that instead. – apc Oct 15 '14 at 15:52

5 Answers5

45

Since you mentioned you don't mind not using make, you can try bash.

I modified the code from this answer, use in the parent directory:

find ./ -iname "*.md" -type f -exec sh -c 'pandoc "${0}" -o "${0%.md}.pdf"' {} \;

It worked when I tested it, so it should work for you.

As per the request Any ideas how to specify the output folder? (Using html as the original file and md as the output):

find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "./output/$(basename ${0%.html}.md)"' {} \;

I have tested this and it works for me.

Edit: As per a comment, the {} \; when used with find and the -exec option is used as a, more or less, placeholder for where the filename should be. As in it expands the filenames found to be placed in the command. The \; ends the -exec. See here for more explanation.

Luke W. Johnston
  • 954
  • 9
  • 17
  • 2
    Thanks. Just to clarify. To get it to do what I want (viz. take `html` files and output `md` files) it should be: `find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;`, right? Any ideas how to specify the output folder? (As it is it just puts the `md` file in the same folder as the corresponding `html` one. – apc Oct 15 '14 at 14:57
  • 2
    This results in the following error on my machine: `pandoc: : openFile: does not exist (No such file or directory)`. Files are found, but `${0}` appears to be empty. – Sam Tuke Jul 30 '17 at 12:29
  • What does the ` {} \;` at the end do? – Nate Glenn Feb 11 '18 at 11:37
  • Let me know if I should open a new question, but when I run this pandoc can't find any included files. For example, I have html and markdown files with images in subfolders (relative to the .md file). `` OR `![Notes/Untitled.png](Notes/Untitled.png)` Gives warning: `Could not fetch resource 'Notes/Untitled.png': PandocResourceNotFound "Notes/Untitled.png"` However, if I run pandoc directly on a markdown/html file it can find the image in the subfolder. – Kyle Ridolfo Jan 06 '20 at 17:28
  • @SamTuke Make sure to have a folder `output` as the generated files are directed to it. – M. Adam Dec 15 '21 at 11:57
1

This is how I did it!

files=($(find ${INPUT_FOLDER} -type f -name '*.md'))
for item in ${files[*]}
do
  printf "   %s\n" $item
  install -d ${DIR}/build/$item
  pandoc $item -f markdown -t html -o ${DIR}/build/$item.html;
  rm -Rf ${DIR}/build/$item
done
Pilk
  • 11
  • 2
1

I've created a python script for converting all files under a folder tree which have a given suffix. It's called Pandoc-Folder. It might be useful, so I've put it on github: https://github.com/andrewrproper/pandoc-folder

You can create a settings folder and file (YAML format), and then run it like this:

python pandoc-folder.py ./path/to/book/.pandoc-folder/settings-file.yml

there is an example-book folder and matching .bat and .sh scripts for how to convert the markdown from the example-book folder into a single output file.

I hope this might be useful to someone.

Andrew
  • 11
  • 3
0

John MacFarlane's answer is almost right. However, one needs to create the subfolder for pandoc, in case it doesn't exist. This is how I'd do it:

TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))

.PHONY : all

all : $(MDS)

$(TXTDIR)/%.markdown : %.html $(TXTDIR)
    mkdir -p $(dir $@)
    pandoc -f html -t markdown -s $< -o $@
Jonathan Prieto-Cubides
  • 2,577
  • 2
  • 18
  • 17
0

This is a solution using ipython:

from pathlib import Path
files = [path for path in Path('.').rglob('*.html')]
for f in files:
    !pandoc -s {str(path)} -o {path.name.replace(".html",".md")} 

Note that you must execute the command inside the directory where you keep the HTML files, and your file will be saved in the same directory. In case just change the output path.

G M
  • 20,759
  • 10
  • 81
  • 84