0

I am trying to write a shell script which will take various non-human readable files and convert them to something human readable.

I have the following code

#!/usr/bin/sh

# Used to convert annoying files to plaintext

case $1 in
  *.pdf) 
    tmp_file=$(mktemp);
    pdftotext "$1" "$tmp_file";
    echo "$tmp_file";
    ;;
  *) 
    echo "$1";
    ;;
esac

When I run this like totext some.pdf I get a temporary file printed as expected and running cat on said file gives me the expected text.

However, when I do totext some.pdf | cat I still get the temporary file printed and I do not see its contents.

How do I make the script have it's arguments be taken up by the pipe and passed to the next program. In this case, cat.

Edward
  • 468
  • 4
  • 18
  • 2
    Run `sh -x yourself` to log the script's operations to stderr. BTW, you've got a bunch of unrelated quoting bugs here; run your code through http://shellcheck.net/ and fix what it finds. (Yes, `echo $1` really is buggy; see [I just assigned a variable, but `echo $variable` shows something else!](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else)) – Charles Duffy Mar 23 '21 at 19:57
  • 3
    piping to stdin is not the same thing as passing as argument. `cat "$(totext some.pdf)"`. – tkausl Mar 23 '21 at 19:57
  • 1
    Indeed -- what tkausl said; I should have paid more attention. `totext some.pdf | cat` is _expected_ to print the filename. `echo hello | cat` prints `hello`, not the contents of a file named `hello`, so this is consistent and in-keeping with normal behavior. – Charles Duffy Mar 23 '21 at 19:58
  • `totext some.pdf | xargs -d $'\n' cat` is a better-practice example; `cat $(totext some.pdf)` is buggy -- see what happens if your temporary directory has spaces in its name. (cc: @tkausl re: above). – Charles Duffy Mar 23 '21 at 19:59
  • (That said, why are you having your program output temporary filenames instead of having it output text in the first place? When you do that you force someone else to do later work to _clean up_ those filenames, so it's not particularly good hygiene). – Charles Duffy Mar 23 '21 at 20:00
  • @CharlesDuffy You're right, should work with `"` though, unless one expects multiple file names. I personally don't like using xargs when I expect one and exactly one argument being returned, otherwise xargs would be the way to go. – tkausl Mar 23 '21 at 20:01
  • I’m voting to close this question because it shows tools working as they are designed and documented to work; no unexpected behavior is shown, and no clear and explicit question is asked. – Charles Duffy Mar 23 '21 at 20:02
  • @CharlesDuffy This will be one bit of a larger script. I take your point re not using temporary files and will make this change. I've made an edit to clarify the question and take into account the shellcheck output – Edward Mar 23 '21 at 20:20
  • 1
    @Edward : Since `cat` without arguments just copies stdin to stdout, a construct such as `someprogram | cat` is uselss. – user1934428 Mar 24 '21 at 08:05
  • Well, useless unless you want to force stdout to be a FIFO instead of a TTY/seekable file/etc. `| cat` does occasionally have a use. – Charles Duffy Mar 24 '21 at 12:43

1 Answers1

1

The problem was solved through using xargs: totext some.pdf | xargs cat

Edward
  • 468
  • 4
  • 18
  • 1
    Note that `xargs cat` has some bugs that can show up if you need to handle arbitrary filenames -- it doesn't work line-by-line, but instead word-by-word, using word-splitting rules that are slightly like how a shell does it but not quite. This is why I generally reccomend `xargs -d $'\n' cat` (if you're targeting GNU-style systems and know your filenames cannot contain newlines), or to change your output to be NUL-delimited and use `xargs -0 cat` – Charles Duffy Mar 23 '21 at 21:36