5

Given a bash variable holding the following string:

INPUT="Cookie: cf_clearance=foo; __cfduid=bar;"

Why is the substitution ${INPUT/cf_clearance=[^;]*;/} producing the output: Cookie: instead of what I'd expect: Cookie: __cfduid=bar;

Testing the same regex in online regex validators confirms that cf_clearance=[^;]*; should match cf_clearance=foo; only, and not the rest of the string.

What am I doing wrong here?

oguz ismail
  • 1
  • 16
  • 47
  • 69
Oscar Hierro
  • 1,117
  • 1
  • 10
  • 15

3 Answers3

11

Use the actual regular-expression matching features instead of parameter expansion, which works with patterns.

[[ $INPUT =~ (.*)(cf_clearance=[^;]*;)(.*) ]]
ans=${BASH_REMATCH[1]}${BASH_REMATCH[3]}

You can also use an extended pattern, which is equivalent to a regular expression in power:

shopt -s extglob
$ echo "${INPUT/cf_clearance=*([^;]);/}"
chepner
  • 497,756
  • 71
  • 530
  • 681
  • 1
    I think that's the correct answer. Note that both of those options are bash-only; they won't work in other shells (e.g. Ubuntu's `dash`, `ksh`, `zsh`, etc.). The question was Bash specific, so this isn't an issue with this answer; just important to note. – Zac B Jan 31 '18 at 13:46
  • It won't work in Python, Perl, Haskell, Ruby, or any number of other languages either; would you like to note that? There are solutions for `ksh` and `zsh` available (probably not very different from the `bash` solution), while POSIX shell would require multiple uses of the `expr` command. – chepner Jan 31 '18 at 13:50
  • 1
    I didn't mean to criticize the answer; I think it's good and correct. I pointed out the other shells' incompatibilities because "bashisms" are a common source of confusion for beginners coding in shell. It's just something to be aware of; the question is about Bash and again your answer is 100% correct. – Zac B Jan 31 '18 at 13:53
4

Use sed:

INPUT=$(sed 's/cf_clearance=[^;]*;//' <<< "$INPUT")
iBug
  • 35,554
  • 7
  • 89
  • 134
  • Thanks. I'm aware I can use sed or even awk, but I'd like to understand why the bash substitution is not working in this case. – Oscar Hierro Jan 31 '18 at 12:42
  • 2
    @oscahie AFAIK Bash does not support regex with its builtin substitution. Only wildcards. – iBug Jan 31 '18 at 12:44
  • I was doubting which answer I should accept. TBH this is a quite straightforward and easy approach, so perfectly valid too. – Oscar Hierro Jan 31 '18 at 15:57
  • @oscahie It's up to you to decide which one to accept. As you participate more on Stack Overflow, you'll face more cases like this one, so don't concern too much. Pick the one you like and move on. Cheers – iBug Jan 31 '18 at 15:58
1

Like you have been told in comments, bash parameter substitution only supports glob patterns, not regular expressions. So the problem is really with your expectation, not with your code per se.

If you know that the expression can be anchored to the beginning of the string, you can use the ${INPUT#prefix} parameter substitution to grab the shortest possible match, and add back the Cookie: in front:

echo "Cookie: ${INPUT#Cookie: cf_clearance=*;}"

If you don't have this guarantee, something very similar can be approximated with a pair of parameter substitutions. Find which part precedes cf_clearance, find which part follows after the semicolon after cf_clearance; glue them together.

head=${INPUT%cf_clearance=*}
tail=${INPUT#*cf_clearance=*;}
echo "$head$tail"

(If you are not scared of complex substitutions, the temporary variables aren't really necessary or useful.

echo "${INPUT%cf_clearance=*}${INPUT#*cf_clearance=*;}"

This is a little dense even for my sophisticated taste, though.)

tripleee
  • 175,061
  • 34
  • 275
  • 318