151

What did I do wrong here?

Trying to match any string that contains spaces, lowercase, uppercase, or numbers. Special characters would be nice too, but I think that requires escaping certain characters.

TEST="THIS is a TEST title with some numbers 12345 and special char *&^%$#"

if [[ "$TEST" =~ [^a-zA-Z0-9\ ] ]]; then BLAH; fi

This obviously only tests for upper, lower, numbers, and spaces. Doesn't work though.

* UPDATE *

I guess I should have been more specific. Here is the actual real line of code.

if [[ "$TITLE" =~ [^a-zA-Z0-9\ ] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; fi

* UPDATE *

./anm.sh: line 265: syntax error in conditional expression
./anm.sh: line 265: syntax error near `&*#]'
./anm.sh: line 265: `  if [[ ! "$TITLE" =~ [a-zA-Z0-9 $%^\&*#] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; return; fi'
codeforester
  • 39,467
  • 16
  • 112
  • 140
Atomiklan
  • 5,164
  • 11
  • 40
  • 62
  • Which shell are you actually using? /bin/sh? /bin/bash? /bin/csh? – Willem Van Onsem Sep 10 '13 at 02:48
  • 10
    It's safer to put the regex in a variable. `re='...whatever...'; [[ $string =~ $re ]]` (without quotes -- this is one of the rare cases where they'll break something that would work without them). – Charles Duffy Sep 10 '13 at 03:49
  • 3
    Put single quotes around the assignment instead. Double quotes will not protect the special characters properly. – tripleee Sep 10 '13 at 04:10
  • 1
    Many thx Charles! It’s still ok not putting it in a variable, but it must NOT be quoted at all! For example: `[[ $var =~ .* ]]` for match regex `.*` (anything). I guess that if you use quotes, the quotes themselves are considered part of the regex… – Stéphane Apr 19 '17 at 23:20
  • 12
    gotcha summary I found: **(1.) save the pattern in a variable using single quotes `pattern='^hello[0-9]*$'` (2.) in the double square expression if you need regex matching do NOT quote the pattern** because quoting DISABLES the regex pattern matching. **(i.e. the expression `[[ "$x" =~ $pattern ]]` will match using regex and the expression `[[ "$x" =~ "$pattern" ]]` disables regex matching and is equivalent to `[[ "$x" == "$pattern" ]]`** ). – Trevor Boyd Smith Oct 16 '17 at 15:18

4 Answers4

263

There are a couple of important things to know about bash's [[ ]] construction. The first:

Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

The second thing:

An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.

Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]

Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...

For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...

Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...

! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])

codeforester
  • 39,467
  • 16
  • 112
  • 140
rici
  • 234,347
  • 28
  • 237
  • 341
  • 11
    Just want to point out that _"!~ is the "does not match" operator"_ is not true. Either use `if ! [[ $x =~ $y ]]` or `if [[ ! $x =~ $y ]]` – alcohol Jun 01 '15 at 07:48
  • shellchecker disagrees... `SC2076: Don't quote rhs of =~, it'll match literally rather than as a regex.` – leonardo Dec 09 '16 at 20:12
  • 4
    @leonard: how does that differ from my statement "you cannot quote the variable expansion" and the comment "This doesn't work"? What is unclear about that? – rici Dec 10 '16 at 04:35
  • indeed, with you, @rici. this post was invaluable to me. i tried single and double quotes, but it did not occur to me to remove the quotes altogether. oh bash ! – orion elenzil Jan 24 '19 at 21:52
  • I have this pattern in a variable `PATTERN="^[\da-zA-Z]{12}$"`, and my `if` is `if [[ $DEVICE=~ $PATTERN ]];`. I'm getting `DEVICE` from the input parameters as `DEVICE=$1`. I'm calling the script with a correct 12-char number like `0004F31198C0` but it is not working. – m4l490n Jan 21 '20 at 20:24
  • @m4l490n: What makes you think that `[\d]` matches anything other than the letter d? I'd use `^[[:xdigit:]]{12}$` (See [man 7 regex](http://man7.org/linux/man-pages/man7/regex.7.html) for Posix regex syntax) – rici Jan 21 '20 at 21:09
  • @rici yeah, apparently that doesn't work here. It should match any digit. I like more the `xdigit`. Thanks! – m4l490n Jan 21 '20 at 21:27
  • @m4l490n: That's a bit overpermissive unless you mistyped it. But `[[:xdigit:]]` is *precisely* what you want (any hex digit). Posix regexes don't have magic backslash escapes. – rici Jan 21 '20 at 21:32
  • @rici you are right, now that I read again my comment is poorly worded. What I meant is that `\d` doesn't work in a bash script, also, I didn't know about the existence of `[[:xdigit:]]` and after looking it up I like it better because as you mentioned that is exactly what I need. – m4l490n Jan 22 '20 at 14:49
  • Word splitting occurs in the expression between the [[ and ]] ? – rosshjb Jun 13 '20 at 20:44
  • 1
    @jinbeomhong: the expression itself is separated into words as usual, using whitespace. But parameter and command expansions are not word-split. – rici Jun 13 '20 at 21:15
  • @rici Could you explain it in detail? In bash manual, `Word splitting and filename expansion are not performed on the words between the [[ and ]];`, but you said `Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted.` – rosshjb Jun 14 '20 at 09:23
  • 1
    @jinbeomhong: I'm not saying anything different from the bash manual. "the **words** between the `[[` and `]]`" are parsed out of the program text, the same way command lines are parsed into words. Unlike command lines, though, the words are not split after expansions. – rici Jun 14 '20 at 11:34
59

In case someone wanted an example using variables...

#!/bin/bash

# Only continue for 'develop' or 'release/*' branches
BRANCH_REGEX="^(develop$|release//*)"

if [[ $BRANCH =~ $BRANCH_REGEX ]];
then
    echo "BRANCH '$BRANCH' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
    echo "BRANCH '$BRANCH' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi
Oliver Pearmain
  • 19,885
  • 13
  • 86
  • 90
14

I'd prefer to use [:punct:] for that. Also, a-zA-Z09-9 could be just [:alnum:]:

[[ $TEST =~ ^[[:alnum:][:blank:][:punct:]]+$ ]]
konsolebox
  • 72,135
  • 12
  • 99
  • 105
6

Or you might be looking at this question because you happened to make a silly typo like I did and have the =~ reversed to ~=

shonky linux user
  • 6,131
  • 4
  • 46
  • 73
  • It seems that the pattern must at least start with a `^` end end with a dollar sign `$` in order to work. It's the only way of doing it such that the resulting truth value is correct for me. (???) – von spotz May 31 '21 at 07:12
  • I still manage to get this wrong in javascript – Sridhar Sarnobat Aug 21 '21 at 02:46