294

Is there any comprehensive list of characters that need to be escaped in Bash? Can it be checked just with sed?

In particular, I was checking whether % needs to be escaped or not. I tried

echo "h%h" | sed 's/%/i/g'

and worked fine, without escaping %. Does it mean % does not need to be escaped? Was this a good way to check the necessity?

And more general: are they the same characters to escape in shell and bash?

jww
  • 97,681
  • 90
  • 411
  • 885
fedorqui
  • 275,237
  • 103
  • 548
  • 598

7 Answers7

368

There are two easy and safe rules which work not only in sh but also bash.

1. Put the whole string in single quotes

This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.

'I'\''m a s@fe $tring which ends in newline
'

sed command: sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"

2. Escape every char with a backslash

This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with ""

\I\'\m\ \a\ \s\@\f\e\ \$\t\r\i\n\g\ \w\h\i\c\h\ \e\n\d\s\ \i\n\ \n\e\w\l\i\n\e"
"

sed command: sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.

2b. More readable version of 2

There's an easy safe set of characters, like [a-zA-Z0-9,._+:@%/-], which can be left unescaped to keep it more readable

I\'m\ a\ s@fe\ \$tring\ which\ ends\ in\ newline"
"

sed command: LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+@%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.


Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.

Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.


(You can easily validate the rules by reading the POSIX spec for sh. For bash, check the reference manual linked by @AustinPhillips)

ndemou
  • 4,691
  • 2
  • 30
  • 33
Jo So
  • 25,005
  • 6
  • 42
  • 59
  • 2
    Note: a good variation on #1 can bee seen here: https://github.com/scop/bash-completion/blob/233421469b12d3b60e7595822cc9166016abe384/bash_completion#L138. It does not require running `sed`, but does require `bash`. – jwd Feb 10 '17 at 18:52
  • 5
    Note for anyone else (like me!) who struggles to get these working.... looks like the flavour of sed you get on OSX doesn't run these sed commands properly. They work fine on Linux though! – dalelane Jun 21 '17 at 21:58
  • @dalelane: Can't test here. Please edit when you have a version that works on both. – Jo So Jun 22 '17 at 14:34
  • Seems you missed-out should the string start with a '-' (minus), or does that only apply to filenames? - in latter case need a './' in front. – slashmais Aug 15 '17 at 09:42
  • I'm not sure what you mean. With those sed commands the input string is taken from stdin. – Jo So Aug 16 '17 at 16:12
  • Thanks a lot for this answer! We were able to use it in our build infrastructure for running commands remotely on other machines as part of distributed C++ builds ( `escape_cmd_line` in https://github.com/YugaByte/yugabyte-db/blob/master/build-support/common-build-env.sh ). I also wanted to mention that this is doable in pure Bash, i.e. we append `" '"${arg/\'/\'\\\'\'}"'"` to the escaped string for every argument `arg`. – mikhail_b Dec 06 '17 at 21:12
  • @MikhailatYugaByte, happy to help! – Jo So Dec 07 '17 at 01:10
  • Despite the green checkmark, this answer seems to be answering a different question from the question in the title. That's disappointing, since I came here wanting an answer to the question in the title. – Don Hatch Nov 02 '18 at 03:51
  • @DonHatch, at least this answer is actionable. If you are looking for a more pedantic answer, checkout the one by Matthew or just "RTFM". – Jo So Nov 02 '18 at 12:14
  • @JoSo My real-life use case is that I want to know whether a given word needs to be quoted/escaped (and, if so, a nice way to) when typing it in as an argument to a command in a bash session (actually when emitting a shell command intended for a user to copy-paste it into bash). The question title *sounds* like what I want (although it's not completely clear). For this, your answer and Matthew's answer are both safe, but they are overkill, since they quote/escape *everything* even when not needed. E.g. `abc` doesn't need any escaping, so it should come out `abc`, not `'abc'` or `\a\b\c`. – Don Hatch Nov 04 '18 at 00:13
  • @DonHatch, version 2b) of my answer will return `abc` unmodified. You will have to do a little work yourself if you want a hybrid that chooses the best approach on a word-by-word basis. – Jo So Nov 04 '18 at 20:04
  • For #1 above - Why is there a ```\$``` at the beginning of ```\$s/\$/'/```. If I remove it I get the same result.... – ikwyl6 Nov 29 '19 at 05:31
  • @ikwyl6, prepend `echo` to that sed command (to see a textual representation of what is executed), and you'll see the difference – Jo So Dec 01 '19 at 16:56
  • This came in handy trying to pipe json-post curl commands into a running docker container on one line. Big thanks. – Bob Arezina May 18 '21 at 00:56
  • Care about ***comma*** `,`! See [Why `,` ?](https://stackoverflow.com/a/27817504/1765658)!! – F. Hauri - Give Up GitHub Feb 02 '22 at 12:29
  • @F.Hauri Looking at the examples from your linked answer, comma should be fine as long as you escape `{`, shouldn't it? – Jo So Apr 24 '22 at 19:38
  • 1
    For macOS users without GNU `sed`: @fd0 has a `sed` option to escape every character: https://apple.stackexchange.com/a/363400/409134 And I wrote a solution that only escapes the control characters using `perl`: https://apple.stackexchange.com/a/458279/409134 – Nils Apr 08 '23 at 14:02
  • @nils Or make sure to start a new line before each `}` in a sed command :-) – nohillside Apr 08 '23 at 14:29
  • Single quoting isn't enough for hyphen (`-`) characters in many contexts. Single quoting with an escape `'\-'` helps in some circumstances. – ctpenrose Jul 13 '23 at 19:58
  • @ctpenrose Example of a situation where `-` is special? – Jo So Jul 13 '23 at 21:11
  • @JoSo About ctpenrose's comment: try: `filename='-dashedFilename'; ls "$filename"` this will give:`ls: invalid option -- 'e'` error. But this could by avoided by adding *pathname* (even relative: `ls "./$filename"` ). – F. Hauri - Give Up GitHub Aug 21 '23 at 07:32
  • @ctpenrose dash could be used as *command option*. To avoid error when`filename='-dashedFilename'; ls "$filename"`, use `ls "/path/to/$filename"` or `ls "./$filename"`. – F. Hauri - Give Up GitHub Aug 21 '23 at 07:44
92

Format that can be reused as shell input

Edit February 2021: ${var@Q}

Under Bash, you could store your variable content with Parameter Expansion's @ command for Parameter transformation:

${parameter@operator}
       Parameter transformation.  The expansion is either a > transforma‐
       tion of the value of parameter or  information  about  parameter
       itself,  depending on the value of operator.  Each operator is a
       single letter:

       Q      The expansion is a string that is the value of  parameter
              quoted in a format that can be reused as input.
...
       A      The  expansion  is  a string in the form of an assignment
              statement or declare command  that,  if  evaluated,  will
              recreate parameter with its attributes and value.

Sample:

$ var=$'Hello\nGood world.\n'
$ echo "$var"
Hello
Good world.

$ echo "${var@Q}"
$'Hello\nGood world.\n'

$ echo "${var@A}"
var=$'Hello\nGood world.\n'

Old answer

There is a special printf format directive (%q) built for this kind of request:

printf [-v var] format [arguments]

    %q     causes printf to output the corresponding argument
           in a format that can be reused as shell input.

Some samples:

read foo
Hello world
printf "%q\n" "$foo"
Hello\ world

printf "%q\n" $'Hello world!\n'
$'Hello world!\n'

This could be used through variables too:

printf -v var "%q" "$foo
"
echo "$var"
$'Hello world\n'

Quick check with all (128) ASCII bytes:

Note that all bytes from 128 to 255 have to be escaped.

for i in {0..127} ;do
    printf -v var \\%o $i
    printf -v var $var
    printf -v res "%q" "$var"
    esc=E
    [ "$var" = "$res" ] && esc=-
    printf "%02X %s %-7s\n" $i $esc "$res"
done |
    column

This must render something like:

00 E ''         1A E $'\032'    34 - 4          4E - N          68 - h
01 E $'\001'    1B E $'\E'      35 - 5          4F - O          69 - i
02 E $'\002'    1C E $'\034'    36 - 6          50 - P          6A - j
03 E $'\003'    1D E $'\035'    37 - 7          51 - Q          6B - k
04 E $'\004'    1E E $'\036'    38 - 8          52 - R          6C - l
05 E $'\005'    1F E $'\037'    39 - 9          53 - S          6D - m
06 E $'\006'    20 E \          3A - :          54 - T          6E - n
07 E $'\a'      21 E \!         3B E \;         55 - U          6F - o
08 E $'\b'      22 E \"         3C E \<         56 - V          70 - p
09 E $'\t'      23 E \#         3D - =          57 - W          71 - q
0A E $'\n'      24 E \$         3E E \>         58 - X          72 - r
0B E $'\v'      25 - %          3F E \?         59 - Y          73 - s
0C E $'\f'      26 E \&         40 - @          5A - Z          74 - t
0D E $'\r'      27 E \'         41 - A          5B E \[         75 - u
0E E $'\016'    28 E \(         42 - B          5C E \\         76 - v
0F E $'\017'    29 E \)         43 - C          5D E \]         77 - w
10 E $'\020'    2A E \*         44 - D          5E E \^         78 - x
11 E $'\021'    2B - +          45 - E          5F - _          79 - y
12 E $'\022'    2C E \,         46 - F          60 E \`         7A - z
13 E $'\023'    2D - -          47 - G          61 - a          7B E \{
14 E $'\024'    2E - .          48 - H          62 - b          7C E \|
15 E $'\025'    2F - /          49 - I          63 - c          7D E \}
16 E $'\026'    30 - 0          4A - J          64 - d          7E E \~
17 E $'\027'    31 - 1          4B - K          65 - e          7F E $'\177'
18 E $'\030'    32 - 2          4C - L          66 - f
19 E $'\031'    33 - 3          4D - M          67 - g

Where first field is hexadecimal value of byte, second contain E if character need to be escaped and third field show escaped presentation of character.

Small script looking for limited bunch of characters

For fun, here is another way for looping over a string, grouping all characters by the need to be escaped.

quickListOfSpecialCharsFromString() {
    local {q,}char bunch{_0,_1} \
        special="${1:-'\`\"/\!@#\$%^&*()-_+={\}[]|;:,.<>? '}"
    while IFS= LANG=C LC_ALL=C read -d '' -rn 1 char; do
        printf -v qchar %q "$char"
        [[ $char == "$qchar" ]]
        local -n bunch=bunch_$?
        bunch+=(${char@Q})
    done < <(printf %s "$special");
    printf 'Characters who %sneed to be escaped:\n%s\n' \
        "doesn't " "${bunch_0[*]}" "" "${bunch_1[*]}"
}
quickListOfSpecialCharsFromString $'`!@#$%^&*()-_+={}|[]\\;\':",.<>?/ '
Characters who doesn't need to be escaped:
'@' '%' '-' '_' '+' '=' ':' '.' '/'
Characters who need to be escaped:
'`' '!' '#' '$' '^' '&' '*' '(' ')' '{' '}' '|' '[' ']' '\' ';' \' '"' ',' '<' '>' '?' ' '

Why ,?

You could see some characters that don't always need to be escaped, like ,, } and {.

So not always but sometime:

echo test 1, 2, 3 and 4,5.
test 1, 2, 3 and 4,5.

or

echo test { 1, 2, 3 }
test { 1, 2, 3 }

but care:

echo test{1,2,3}
test1 test2 test3

echo test\ {1,2,3}
test 1 test 2 test 3

echo test\ {\ 1,\ 2,\ 3\ }
test  1 test  2 test  3

echo test\ {\ 1\,\ 2,\ 3\ }
test  1, 2 test  3

See Brace Expansion chapter in 's man page:

  man -P'less +/Brace\ Expansion' bash

F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137
  • This has the problem that, calling pritnf via bash/sh, the string must first be shell escaped for bash/sh – ThorSummoner Jul 10 '15 at 19:36
  • 2
    @ThorSummoner, not if you pass the string as a literal argument to the shell from a different language (where you presumably already know how to quote). In Python: `subprocess.Popen(['bash', '-c', 'printf "%q\0" "$@"', '_', arbitrary_string], stdin=subprocess.PIPE, stdout=subprocess.PIPE).communicate()` will give you a properly shell-quoted version of `arbitrary_string`. – Charles Duffy Jul 16 '15 at 23:00
  • 1
    FYI bash's `%q` was broken for a long time - If my mind serves me well, an error was fixed (but might still be broken) in 2013 after being broken for ~10 years. So don't rely on it. – Jo So Feb 03 '17 at 17:36
  • @CharlesDuffy Of course, once you are in Python land, `shlex.quote()` (>= 3.3, `pipes.quote()` - undocumented - for older versions) will also do the job and produce a more human-readable version (adding quotes and escaping, as necessary) of most strings, without the need to spawn a shell. – Thomas Perl Oct 21 '19 at 10:32
  • 2
    Thank you to add special notes about `,`. I was surprised to learn that built-in Bash `printf -- %q ','` gives `\,`, but `/usr/bin/printf -- %q ','` gives `,` (un-escapted). Same for other chars: `{`, `|`, `}`, `~`. – kevinarpe May 15 '20 at 14:22
  • _"Note that all bytes from 128 to 255 have to be escaped"_ why? none of them have special meaning to the shell, and e.g. stuff like `echo äöä` works fine, even though `ä` and `ö` contain bytes with values in that range in UTF-8 (and in legacy encodings). – ilkkachu Feb 24 '21 at 18:38
  • 1
    @ilkkachu Depending on local config, using utf or iso as default, playing with bytes between 128 to 255 could lead to strange behaviour – F. Hauri - Give Up GitHub Feb 25 '21 at 10:29
  • 1
    That new `@Q` is very useful! – fedorqui Feb 02 '22 at 09:49
  • Thank you for the February 2021 edit, `@A` was exactly what I needed ! – Lenormju Jul 06 '22 at 10:02
52

To save someone else from having to RTFM... in bash:

Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !.

...so if you escape those (and the quote itself, of course) you're probably okay.

If you take a more conservative 'when in doubt, escape it' approach, it should be possible to avoid getting instead characters with special meaning by not escaping identifier characters (i.e. ASCII letters, numbers, or '_'). It's very unlikely these would ever (i.e. in some weird POSIX-ish shell) have special meaning and thus need to be escaped.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
Matthew
  • 2,593
  • 22
  • 25
  • 2
    here is the manual quoted above: https://www.gnu.org/software/bash/manual/html_node/Double-Quotes.html – code_monk Dec 21 '16 at 21:33
  • 1
    This is a short, sweet and mostly correct answer (+1 for that) but maybe it's even better to use single quotes - see my longer answer. – Jo So Feb 03 '17 at 17:39
49

Using the print '%q' technique, we can run a loop to find out which characters are special:

#!/bin/bash
special=$'`!@#$%^&*()-_+={}|[]\\;\':",.<>?/ '
for ((i=0; i < ${#special}; i++)); do
    char="${special:i:1}"
    printf -v q_char '%q' "$char"
    if [[ "$char" != "$q_char" ]]; then
        printf 'Yes - character %s needs to be escaped\n' "$char"
    else
        printf 'No - character %s does not need to be escaped\n' "$char"
    fi
done | sort

It gives this output:

No, character % does not need to be escaped
No, character + does not need to be escaped
No, character - does not need to be escaped
No, character . does not need to be escaped
No, character / does not need to be escaped
No, character : does not need to be escaped
No, character = does not need to be escaped
No, character @ does not need to be escaped
No, character _ does not need to be escaped
Yes, character   needs to be escaped
Yes, character ! needs to be escaped
Yes, character " needs to be escaped
Yes, character # needs to be escaped
Yes, character $ needs to be escaped
Yes, character & needs to be escaped
Yes, character ' needs to be escaped
Yes, character ( needs to be escaped
Yes, character ) needs to be escaped
Yes, character * needs to be escaped
Yes, character , needs to be escaped
Yes, character ; needs to be escaped
Yes, character < needs to be escaped
Yes, character > needs to be escaped
Yes, character ? needs to be escaped
Yes, character [ needs to be escaped
Yes, character \ needs to be escaped
Yes, character ] needs to be escaped
Yes, character ^ needs to be escaped
Yes, character ` needs to be escaped
Yes, character { needs to be escaped
Yes, character | needs to be escaped
Yes, character } needs to be escaped

Some of the results, like , look a little suspicious. Would be interesting to get @CharlesDuffy's inputs on this.

F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137
codeforester
  • 39,467
  • 16
  • 112
  • 140
  • 2
    You may read answer to *`,` look a little suspicious* at last paragraph of [my answer](https://stackoverflow.com/a/27817504/1765658) – F. Hauri - Give Up GitHub May 17 '18 at 19:00
  • 2
    Keep in mind that `%q` doesn't know where within the shell you are planing to use the character, so it will escape all characters that can have a special meaning in any possible shell context. `,` itself has no special meaning to she shell but as @F.Hauri has pointed out in his reply, it does have a special meaning within `{...}` brace expansion: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Brace-Expansion This is like ! which also only requires expansion in specific situations, not in general: `echo Hello World!` works just fine, yet `echo test!test` will fail. – Mecki May 17 '19 at 19:25
18

Characters that need escaping are different in Bourne or POSIX shell than Bash. Generally (very) Bash is a superset of those shells, so anything you escape in shell should be escaped in Bash.

A nice general rule would be "if in doubt, escape it". But escaping some characters gives them a special meaning, like \n. These are listed in the man bash pages under Quoting and echo.

Other than that, escape any character that is not alphanumeric, it is safer. I don't know of a single definitive list.

The man pages list them all somewhere, but not in one place. Learn the language, that is the way to be sure.

One that has caught me out is !. This is a special character (history expansion) in Bash (and csh) but not in Korn shell. Even echo "Hello world!" gives problems. Using single-quotes, as usual, removes the special meaning.

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • 1
    I specially like the _A nice general rule would be "if in doubt, escape it"_ advice. Still have the doubt whether checking with `sed` is good enough to see if it has to be escaped. Thanks for your answer! – fedorqui Apr 04 '13 at 15:29
  • 2
    @fedorqui: Checking with `sed` is not necessary, you could check with almost anything. `sed` is not the issue, `bash` is. Inside single quotes there are no special characters (except single quotes), you can't even escape characters there. A `sed` command should usually be inside single quotes because RE metacharacters have too many overlaps with shell metacharacters to be safe. The exception is when embedding shell variables, which has to be done carefully. – cdarke Apr 05 '13 at 09:11
  • 5
    Check with `echo`. If you get out what you put in, it doesn't need to be escaped. :) – Mark Reed Jul 28 '14 at 15:57
6

I presume that you're talking about bash strings. There are different types of strings which have a different set of requirements for escaping. eg. Single quotes strings are different from double quoted strings.

The best reference is the Quoting section of the bash manual.

It explains which characters needs escaping. Note that some characters may need escaping depending on which options are enabled such as history expansion.

Austin Phillips
  • 15,228
  • 2
  • 51
  • 50
  • 3
    So it confirms that escaping is such a _jungle_ without an easy solution, will have to check each case. Thanks! – fedorqui Apr 04 '13 at 15:30
  • @fedorqui As with any language, there's a set of rules to be followed. For bash string escaping, the set of rules is quite small as described in the manual. The easiest string to use is single quotes since nothing needs escaping. However, there is no way to include a single quote in a single quoted string. – Austin Phillips Apr 04 '13 at 22:11
  • @fedorqui. It's **not** a jungle. Escaping is quite doable. See my new post. – Jo So Nov 18 '13 at 16:47
  • @fedorqui You can't use a single quote inside a single-quoted string but you can "escape" it with something like: 'text'"'"'more text' – CR. Nov 25 '14 at 03:22
5

I noticed that bash automatically escapes some characters when using auto-complete.

For example, if you have a directory named dir:A, bash will auto-complete to dir\:A

Using this, I runned some experiments using characters of the ASCII table and derived the following lists:

Characters that bash escapes on auto-complete: (includes space)

 !"$&'()*,:;<=>?@[\]^`{|}

Characters that bash does not escape:

#%+-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~

(I excluded /, as it cannot be used in directory names)

yuri
  • 161
  • 1
  • 4
  • 3
    If you really wanted to have a comprehensive list, I'd suggest looking at which characters `printf %q` does and does not modify if passed as an argument -- ideally, going through the entire characterset. – Charles Duffy Jan 30 '16 at 04:23
  • There are instances where even with the apostrophe string, you may wish to escape letters and numbers to produce special-characters. For example: tr '\n' '\t' which translates newline characters into tab characters. – Dick Guertin Aug 09 '16 at 17:54
  • @CharlesDuffy The characters that auto-complete escapes are somewhat different from what `printf %q` does, I ran into this testing a pathname containing the 'home' tilde (which %q escapes, causing a problem for me, where auto-complete does not). – Compholio Nov 03 '21 at 19:46