5

I found this question which had an answer to the question of performing batch conversions with Pandoc, but it doesn't answer the question of how to make it recursive. I stipulate up front that I'm not a programmer, so I'm seeking some help on this here.

The Pandoc documentation is slim on details regarding passing batches of files to the executable, and based on the script it looks like Pandoc itself is not capable of parsing more than a single file at a time. The script below works just fine in Mac OS X, but only processes the files in the local directory and outputs the results in the same place.

find . -name \*.md -type f -exec pandoc -o {}.txt {} \;

I used the following code to get something of the result I was hoping for:

find . -name \*.html -type f -exec pandoc -o {}.markdown {} \;

This simple script, run using Pandoc installed on Mac OS X 10.7.4 converts all matching files in the directory I run it in to markdown and saves them in the same directory. For example, if I had a file named apps.html, it would convert that file to apps.html.markdown in the same directory as the source files.

While I'm pleased that it makes the conversion, and it's fast, I need it to process all files located in one directory and put the markdown versions in a set of mirrored directories for editing. Ultimately, these directories are in Github repositories. One branch is for editing while another branch is for production/publishing. In addition, this simple script is retaining the original extension and appending the new extension to it. If I convert back again, it will add the HTML extension after the markdown extension, and the file size would just grow and grow.

Technically, all I need to do is be able to parse one branches directory and sync it with the production one, then when all changed, removed, and new content is verified correct, I can run commits to publish the changes. It looks like the Find command can handle all of this, but I just have no clue as to how to properly configure it, even after reading the Mac OS X and Ubuntu man pages.

Any kind words of wisdom would be deeply appreciated.

TC

Community
  • 1
  • 1
Tyler Regas
  • 63
  • 1
  • 7

3 Answers3

13

Create the following Makefile:

TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))

.PHONY : all

all : $(MDS)

$(TXTDIR) :
    mkdir $(TXTDIR)

$(TXTDIR)/%.markdown : %.html $(TXTDIR)
    pandoc -f html -t markdown -s $< -o $@

(Note: The indented lines must begin with a TAB -- this may not come through in the above, since markdown usually strips out tabs.)

Then you just need to type 'make', and it will run pandoc on every file with a .html extension in the working directory, producing a markdown version in 'sources'. An advantage of this method over using 'find' is that it will only run pandoc on a file that has changed since it was last run.

John MacFarlane
  • 8,511
  • 39
  • 33
  • Wow! Thanks!! I've been watching this question since I posted it yesterday and only just found your answer. How odd, but much thanks. This looks really cool, though I don't exactly understand what it's doing. I see that you are defining TXTDIR, HTMLS, and MDS and that they have some kind of logic in them. I will run it up against a test copy of the original HTML. From what you describe, it only works on the files located in the directory that it is run in, yes? Thanks so much! – Tyler Regas Jun 14 '12 at 20:49
  • I figured out that it looks like perl, so I ran an update to make sure it was updated on my system. This is what I received:```GRID-Tyler-MBP:apps admin$ make --makefile=pandoc_h2m.makefile pandoc_h2m.makefile:13: warning: overriding commands for target `sources' pandoc_h2m.makefile:10: warning: ignoring old commands for target `sources' make: *** No rule to make target `%.html', needed by `sources'. Stop.``` – Tyler Regas Jun 14 '12 at 21:09
  • 1
    This works. My bad was to add spaces where they did not belong, which caused the script to break. This is very, VERY awesome!! One note, for anyone else trying to use this, in order for Perl to be used on the Mac, you first must have XCode installed, the command line tools installed after that, and then update Perl. I used the following command located elsewhere on this wonderful site: ```sudo /usr/bin/perl MCPAN -e 'install "JSON"'```. This instantiated Perl, applied updates, and then installed JSON, which is helpful anyway. – Tyler Regas Jun 14 '12 at 21:19
  • Well, that just goes to show how much of a developer I am NOT :D Thanks, John! Very, VERY much appreciated. – Tyler Regas Jun 14 '12 at 21:37
  • Why am I getting `Makefile:5: *** commands commence before first target. Stop.` – Hari K T Dec 13 '12 at 00:53
  • 1
    @JohnMacFarlane The make script doesn't do recursive conversion (html located inside subdirectories)? Any suggestion on that? – Porcupine Jun 30 '18 at 22:50
13

Just for the record: here is how I achieved the conversion of a bunch of HTML files to their Markdown equivalents:

for file in $(ls *.html); do pandoc -f html -t markdown "${file}" -o "${file%html}md"; done

When you have a look at the script code from the -o argument, you'll see it uses string manipulation to remove the existing html with the md file ending.

Andre Steingress
  • 4,381
  • 28
  • 28
0

To run pandoc command line tool recursively, you need to cd to the directory where you want to execute it, otherwise you will encounter an error that 'Could fetch image xxx'. I have tried all the answers above and they all have various problems, so I created a node script by myself to meet my needs. You can customize this script (if you know a little Javascript language), or you can also put forward your needs, I will help you when I have time.(I use this script convert markdown to docx): https://gist.github.com/Xheldon/dfc675c271c909dc3a6e94c869d6ebd4

Xheldon Cao
  • 315
  • 1
  • 2
  • 12