1

I have over a thousand files in a directory which I want to convert to text files. I use a code like the one below to first take out the spaces in the file names and then convert the files to text:

!/bin/bash

   find . -name '*.pdf' | while read file;
   do
    target=`echo "$file" | sed 's/ /_/g'`;
    echo "Renaming '$file' to '$target'";
    mv "$file" "$target";
    chmod 777 *.pdf;
    pdftotext -layout  "$target"  "$target.txt";
   done;

This code however converts a file like I love you.pdf to I_love_you.pdf.txt. I want to remove the .pdf part of the final file extension.

Community
  • 1
  • 1
hlosukwakha
  • 170
  • 1
  • 12

4 Answers4

1

My preferred way of doing this is to use substitution to modify the extension:

pdftotext -layout "$target" "${target/%.pdf/.txt}"

The % there means to match only at the end of the string.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
0

Your problem is this:

$target = "i_love_you.pdf"

therefore

$target.txt = "i_love_you.pdf.txt"

Note that if you don't supply the second parameter to pdftotext, it will by default convert file.pdf to file.txt, which seems perfect for your requirements.

Simon MᶜKenzie
  • 8,344
  • 13
  • 50
  • 77
0

Use 'basename' eg.

basename "i_love_you.pdf" ".pdf" returns "i_love_you"

See See How do I remove the file suffix and path portion from a path string in Bash?

Community
  • 1
  • 1
Pete855217
  • 1,570
  • 5
  • 23
  • 35
0

Another option might be:

find ./ -name "*.pdf" -exec pdftotext {} \;

Nick
  • 180
  • 1
  • 4