0

I am attempting to convert PDF files in 2,432 subdirectories (one PDF file per folder) to HTML files.

For example, I have tried a few variations of

find . -type d | while read d; for file in *.pdf; do pdftohtml -c -i -s "$file"; done

and

for f in ./*/*.pdf; do pdftohtml -c -i -s "$file"; done

without any success. I have also tried some others, however, I just can't get anything to work this time.

I know that part of the code works because I can put multiple PDF files in one folder and use

for file in *.pdf; do pdftohtml -c -i -s "$file"; done

to recursively convert all of the files in that folder to HTML.

Is there a way that I can search through each folder and convert each file with a bash script? Or is this something I will have to do one folder at a time?

Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
JeremyC
  • 3
  • 3
  • Do you want to have the html files generated in the same directory as the source pdf file? – oliv Oct 18 '18 at 10:52
  • Certainly! Having the converted files in the same folder as the PDF is what I was hoping to do. Sorry, I forgot to include that above. – JeremyC Oct 18 '18 at 11:23

3 Answers3

1

You can use the find command with the option -exec to trigger the conversion:

find /path/to/your/root/pdf/folder -type f -name "*.pdf" -exec bash -c 'pdftohtml -c -i -s "$1"' _ {} \;

The pdftohtml is executed for every pdf file found. Note that {} represents the pdf file.

oliv
  • 12,690
  • 25
  • 45
  • Is there any advantage of using `-exec bash ...` over `-exec pdftohtml ... {}` ? – Socowi Oct 18 '18 at 11:13
  • @Socowi In this case maybe not, but in the general case where you need execute bash builtin command, the `bash -c` is required. – oliv Oct 18 '18 at 11:17
  • Well, this worked like a champ! I want to play with some of the other solutions for learning purposes, but this just flew right through and converted everything in each folder. Thanks much! – JeremyC Oct 18 '18 at 11:48
0

Your second command seemed about right. There was just one little error

for f in ./*/*.pdf; do pdftohtml -c -i -s "$file"; done

You wrote for f but used $file. Try

for f in ./*/*.pdf; do pdftohtml -c -i -s "$f"; done
Socowi
  • 25,550
  • 3
  • 32
  • 54
0

use:

find . -name \*.pdf -exec pdftohtml -c -i -s {} \;
  • I want to try this, as well. What is the purpose of escaping the *.pdf? I'm still learning bash, regex, etc., and don't think I've come across anything that has used an escape yet. That makes it really short and clean! – JeremyC Oct 18 '18 at 11:43
  • well, I had some troubles in the past with some files, so I escape. We can use "*.pdf" with no escape. – Incrivel Monstro Verde Oct 18 '18 at 11:54
  • 1
    see more about escaping: https://stackoverflow.com/questions/15783701/which-characters-need-to-be-escaped-when-using-bash – Incrivel Monstro Verde Oct 18 '18 at 13:12
  • 1
    Got it. Many years ago a PHP mentor taught me the habit of prevention (a few keystrokes now can save you from a million later on.) Seems to apply here, as well. Thank you for the info and the link! – JeremyC Oct 18 '18 at 19:28