0

I'm trying to use enscript to print PDFs from Mutt, and hitting character encoding issues. One way around them seems to be to just use sed to replace the problem characters: sed -ir 's/[“”]/"/g' {input}

My test input file is this:

“very dirty”    
we’re 

I'm hoping to get "very dirty" and we're but instead I'm still getting

â\200\234very dirtyâ\200\235
weâ\200\231re

I found a nice little post on printing to PDFs from Mutt that I used as a starting point. I have a bash script that I point to from my .muttrc with set print_command="$HOME/.mutt/print.sh" -- the script currently reads about like this:

#!/bin/bash
input="$1" pdir="$HOME/Desktop" open_pdf=evince


# Straighten out curly quotes

sed -ir 's/[“”]/"/g' $input
sed -ir "s/[’]/'/g" $input


tmpfile="`mktemp $pdir/mutt_XXXXXXXX.pdf`"
enscript --font=Courier8 $input -2r --word-wrap --fancy-header=mutt -p - 2>/dev/null | ps2pdf - $tmpfile
$open_pdf $tmpfile >/dev/null 2>&1 &
sleep 1
rm $tmpfile

It does a fine job of creating a PDF (and works fine if you give it a file as an argument) but I can't figure out how to fix the curly quotes.

I've tried a bunch of variations on the sed line:

input=sed -r 's/[“”]/"/g' $input

$input=sed -ir "s/[’]/'/g" $input

Per the suggestion at Can I use sed to manipulate a variable in bash? I also tried input=$(sed -r 's/[“”]/"/g' <<< $input) and I get an error: "Syntax error: redirection unexpected"

But none manages to actually change $input -- what is the correct syntax to change $input with sed?

Note: I accepted an answer that resolved the question I asked, but as you can see from the comments there are a couple of other issues here. enscript is taking in a whole file as a variable, not just the text of the file. So trying to tweak the text inside the file is going to take a few extra steps. I'm still learning.

Community
  • 1
  • 1
Amanda
  • 12,099
  • 17
  • 63
  • 91
  • 2
    Maybe you need to set your locale http://stackoverflow.com/questions/27072558/sed-and-utf-8-encoding. – Stargateur Nov 19 '16 at 22:11
  • 1
    Note that if you want these quotes to actually impact how arguments are grouped (that is, to have semantic meaning), it's too late to fix that after your script has been started -- they've already been treated as literal rather than syntactic. – Charles Duffy Nov 19 '16 at 22:14
  • 1
    Also, consider using lower-case variable names -- per the POSIX spec on environment variable names, all-caps names are reserved for variables that modify behavior of the system or shell, whereas names with at least one lower-case character are reserved for application use. Since setting a shell variable with a name that overlaps an environment variable overwrites the latter, these conventions necessarily apply in both places. – Charles Duffy Nov 19 '16 at 22:15
  • ...you might also consider running your code through http://shellcheck.net/ – Charles Duffy Nov 19 '16 at 22:15
  • 1
    @Stargateur I will edit, but it works fine at the command line. It just isn't altering – Amanda Nov 19 '16 at 22:18
  • Err. If you're using `#!/usr/bin/env sh`, then this is a POSIX sh script, not a bash script. That's a critical difference -- `<<<` isn't guaranteed to be available in POSIX sh, for instance. Either change the question's tag from `bash` to `sh`, or change the shebang from `#!/usr/bin/env sh` to `#!/usr/bin/env bash`. – Charles Duffy Nov 19 '16 at 22:25
  • @Amanda, ...if it works at the command line, that's a strong indication that the shebang is at fault -- your command-line interpreter which you're testing with is presumably `bash`, not `sh`. – Charles Duffy Nov 19 '16 at 22:37
  • @Amanda Please read my answer in full, all the concerns you raise in the Note: have been addresed in the script at the end. Which runs either under `sh` or `bash`. –  Nov 20 '16 at 04:04
  • I addressed the local variable naming, and switched over to bash, to reduce distraction. – Amanda Nov 20 '16 at 06:25

2 Answers2

5

On Editing Variables In General

BashFAQ #21 is a comprehensive reference on performing search-and-replace operations in bash, including within variables, and is thus recommended reading. On this particular case:

Use the shell's native string manipulation instead; this is far higher performance than forking off a subshell, launching an external process inside it, and reading that external process's output. BashFAQ #100 covers this topic in detail, and is well worth reading.

Depending on your version of bash and configured locale, it might be possible to use a bracket expression (ie. [“”], as your original code did). However, the most portable thing is to treat and separately, which will work even without multi-byte character support available.

input='“hello ’cruel’ world”'
input=${input//'“'/'"'}
input=${input//'”'/'"'}
input=${input//'’'/"'"}
printf '%s\n' "$input"

...correctly outputs:

"hello 'cruel' world"

On Using sed

To provide a literal answer -- you almost had a working sed-based approach in your question.

input=$(sed -r 's/[“”]/"/g' <<<"$input")

...adds the missing syntactic double quotes around the parameter expansion of $input, ensuring that it's treated as a single token regardless of how it might be string-split or glob-expanded.


But All That May Not Help...

The below is mentioned because your test script is manipulating content passed on the command line; if that's not the case in production, you can probably disregard the below.

If your script is invoked as ./yourscript “hello * ’cruel’ * world”, then information about exactly what the user entered is lost before the script is started, and nothing you can do here will fix that.

This is because $1, in that scenario, will only contain “hello; ’cruel’ and world” are in their own argv locations, and the *s will have been replaced with lists of files in the current directory (each such file substituted as a separate argument) before the script was even started. Because the shell responsible for parsing the user's command line (which is not the same shell running your script!) did not recognize the quotes as valid at the time when it ran this parsing, by the time the script is running, there's nothing you can do to recover the original data.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    @sorontar, sure -- but if the title asks how to modify a *variable*, that's what I'm going to answer, because anyone else wanting to know how to modify a *variable* will be seeing that title and coming to the question with the expectation that it'll help. – Charles Duffy Nov 20 '16 at 01:38
  • ...that said, you have an excellent answer, and I'm happy to have them both here. – Charles Duffy Nov 20 '16 at 01:40
2

Abstract: The way to use sed to change a variable is explored, but what you really need is a way to use and edit a file. It is covered ahead.

Sed

The (two) sed line(s) could be solved with this (note that -i is not used, it is not a file but a value):

input='“very dirty”    
we’re'

sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"

But it should be faster (for small strings) to use the internals of the shell:

input='“very dirty”    
we’re'

input=${input//[“”]/\"}
input=${input//[’]/\'}
printf '%s\n' "$input"

$1

But there is an underlying problem with your script, you are trying to clean an input received from the command line. You are using $1 as the source of the string. Once somebody writes:

./script  “very dirty”    
we’re

That input is lost. It is broken into shell's tokens and "$1" will be “very only.

But I do not believe that is what you really have.

file

However, you are also saying that the input comes from a file. If that is the case, then read it in with:

input="$(<infile)"           # not $1

sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"

Or, if you don't mind to edit (change) the file, do this instead:

sed -i 's/[“”]/\"/g;s/’/'\''/g' infile
input="$(<infile)"

Or, if you are clear and certain that what is being given to the script is a filename, like:

./script infile

You can use:

infile="$1"
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
input="$(<"$infile")"

Other comments:

Then:

  • Quote your variables.
  • Do not use the very old `…` syntax, use $(…) instead.
  • Do not use variables in UPPER case, those are reserved for environment variables.
  • And (unless you actually meant sh) use a shebang (first line) that targets bash.
  • The command enscript most definitively requires a file, not a variable.
  • Maybe you should use evince to open the PS file, there is no need of the step to make a pdf, unless you know you really need it.
  • I believe that is better use a file to store the output of enscript and ps2pdf.
  • Do not hide the errors printed by the commands until everything is working as desired, then, just call the script as:

    ./script infile 2>/dev/null

    Or as required to make it less verbose.

Final script.

If you call the script with the name of the file that enscript is going to use, something like:

./script infile

Then, the whole script will look like this (runs both in bash or sh):

#!/usr/bin/env bash
Usage(){ echo "$0; This script require a source file"; exit 1; }
[ $# -lt 1 ] && Usage
[ ! -e $1 ] && Usage
infile="$1"
pdir="$HOME/Desktop"
open_pdf=evince

# Straighten out curly quotes
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"

tmpfile="$(mktemp "$pdir"/mutt_XXXXXXXX.pdf)"
outfile="${tmpfile%.*}.ps"
enscript --font=Courier10 "$infile" -2r \
     --word-wrap --fancy-header=mutt -p "$outfile"

ps2pdf "$outfile" "$tmpfile"

"$open_pdf" "$tmpfile" >/dev/null 2>&1 &
sleep 5
rm "$tmpfile" "$outfile"