17

How can I include the regex match in the replacement expression in BASH?

Non-working example:

#!/bin/bash
name=joshua
echo ${name//[oa]/X\1}

I expect to output jXoshuXa with \1 being replaced by the matched character.

This doesn't actually work though and outputs jX1shuX1 instead.

joshuapoehls
  • 32,695
  • 11
  • 50
  • 61
  • 2
    I don't see anything in my version of bash (4.1.5) about being able to do regex substitutions using the `${foo/bar/baz}` syntax. Do you have any references for why you think you should be able to do that? – C. K. Young Apr 11 '11 at 17:20
  • I'm not sure where I stumbled across it but it does work. Using my example above you can see that it is replacing the `o` and the `a` with an `X`. Pretty slick. – joshuapoehls Apr 11 '11 at 17:29
  • See http://tldp.org/LDP/abs/html/parameter-substitution.html, the description of this is about 3/4 of the way down the page. – Andrew Clark Apr 11 '11 at 17:31

3 Answers3

43

Perhaps not as intuitive as sed and arguably quite obscure but in the spirit of completeness, while BASH will probably never support capture variables in replace (at least not in the usual fashion as parenthesis are used for extended pattern matching), but it is still possible to capture a pattern when testing with the binary operator =~ to produce an array of matches called BASH_REMATCH.

Making the following example possible:

#!/bin/bash
name='joshua'
[[ $name =~ ([ao].*)([oa]) ]] && \
    echo ${name/$BASH_REMATCH/X${BASH_REMATCH[1]}X${BASH_REMATCH[2]}}

The conditional match of the regular expression ([ao].*)([oa]) captures the following values to $BASH_REMATCH:

$ echo ${BASH_REMATCH[*]}
oshua oshu a

If found we use the ${parameter/pattern/string} expansion to search for the pattern oshua in parameter with value joshua and replace it with the combined string Xoshu and Xa. However this only works for our example string because we know what to expect.

For something that functions more like the match all or global regex counterparts the following example will greedy match for any unchanged o or a inserting X from back to front.

#/bin/bash
name='joshua'
while [[ $name =~ .*[^X]([oa]) ]]; do
    name=${name/$BASH_REMATCH/${BASH_REMATCH:0:-1}X${BASH_REMATCH[1]}}
done 
echo $name

The first iteration changes $name to joshuXa and finally to jXoshuXa before the condition fails and the loop terminates. This example works similar to the look behind expression /(?<!X)([oa])/X\1/ which assumes to only care about the o or a characters which don't have a X prefixed.

The output for both examples:

jXoshuXa

nJoy!

nickl-
  • 8,417
  • 4
  • 42
  • 56
9
bash> name=joshua  
bash> echo $name | sed 's/\([oa]\)/X\1/g'  
jXoshuXa
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
2

The question bash string substitution: reference matched subexpressions was marked a duplicate of this one, in spite of the requirement that

The code runs in a long loop, it should be a one-liner that does not launch sub-processes.

So the answer is:

If you really cannot afford launching sed in a subprocess, do not use bash ! Use perl instead, its read-update-output loop will be several times faster, and the difference in syntax is small. (Well, you must not forget semicolons.)

I switched to perl, and there was only one gotcha: Unicode support was not available on one of the computers, I had to reinstall packages.

18446744073709551615
  • 16,368
  • 4
  • 94
  • 127