Search and replace in bash using regular expressions

Question

I've seen this example:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

Which follows this syntax: ${variable//pattern/replacement}

Unfortunately the pattern field doesn't seem to support full regex syntax (if I use . or \s, for example, it tries to match the literal characters).

How can I search/replace a string using full regex syntax?

Found a related question here: http://stackoverflow.com/questions/5658085/bash-script-regular-expressions-how-to-find-and-replace-all-matches — jheddings, Oct 24 '12 at 05:37
FYI, `\s` isn't part of standard POSIX-defined regular expression syntax (neither BRE or ERE); it's a PCRE extension, and mostly not available from shell. `[[:space:]]` is the more universal equivalent. — Charles Duffy, Jul 08 '14 at 16:49
`\s` can be replaced by `[[:space:]]`, by the way, `.` by `?`, and extglob extensions to the baseline shell pattern language can be used for things like optional subgroups, repeated groups, and the like. — Charles Duffy, Feb 05 '15 at 20:25
I use this in bash version 4.1.11 on Solaris... echo ${hello//[0-9]} Notice the lack of the final slash. — Daniel Liston, Aug 24 '18 at 03:35

score 230 · Accepted Answer · edited Apr 24 '19 at 16:23

230

Use sed:

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

Note that the subsequent -e's are processed in order. Also, the g flag for the expression will match all occurrences in the input.

You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and evaluation). And of course, you have the full language support of Perl to do your bidding...

edited Apr 24 '19 at 16:23

Charles Duffy

280,126
43
390
441

answered Oct 24 '12 at 05:16

jheddings

26,717
8
52
65

This only does a single replace as far as I can tell. Is there a way to have it replace all occurances of the pattern like what the code I posted does? – Lanaru Oct 24 '12 at 05:21
I've updated my answer to demonstrate multiple replacements as well as global pattern matching. Let me know if that helps. – jheddings Oct 24 '12 at 05:28
Thanks so much! Out of curiosity, why did you switch from a one line version (in your original answer) to a two-liner? – Lanaru Oct 24 '12 at 05:30
Is there a reason you're using an all-caps `MYVAR`? Best practice is to save all-caps for environment variables and shell built-ins, thereby avoiding namespace conflicts. – Charles Duffy Mar 09 '14 at 15:34
16

Using `sed` or other external tools is expensive due to process initialization time. I especially searched for all-bash solution, because I found using bash substitutions to be more than 3x faster than calling `sed` for each item in my loop. – rr- Oct 11 '14 at 13:36
9

@CiroSantilli六四事件法轮功纳米比亚威视, granted, that's the common wisdom, but that doen't make it wise. Yes, bash is slow no matter what -- but well-written bash that avoids subshells is literally orders of magnitude faster than bash that calls external tools for every tiny little task. Also, well-written shell scripts will benefit from faster interpreters (like ksh93, which has performance on par with awk), whereas poorly-written ones there's nothing to be done for. – Charles Duffy Aug 10 '15 at 15:11
In order to illustrate the expensiveness of calling external tools, here are the outputs of the `time` command on one of my script before and after removing external tools calls. **BEFORE :** `real 1m1.209s / user 0m45.342s / sys 0m13.142s` **AFTER :** `real 0m3.433s / user 0m2.519s / sys 0m0.211s` 94% faster !! – Stephan Apr 10 '20 at 07:46
1

I'd recommend perl instead of sed due to portability issues between the Linux version of sed and the BSD version that's used on, eg, MacOS. – Dave Dopson Apr 12 '21 at 03:40
I think it is safe to assume that if someone works on a Mac and cares about the performance of a bash script, they've probably already run `brew install coreutils` – ontek Apr 15 '21 at 16:10
If you're trying to save this output back into the variable you can do e.g. `MYVAR=$(echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g')` – twhitney Sep 03 '21 at 18:09
When assigning to a variable you should use -n to avoid echo appending a newilne, like `MYVAR=$(echo -n "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g')` – LazyProphet Oct 07 '21 at 10:27

score 167 · Answer 2 · answered Mar 07 '14 at 21:55

167

This actually can be done in pure bash:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...yields...

howareyoudoingtodday

answered Mar 07 '14 at 21:55

Charles Duffy

280,126
43
390
441

4

Something tells me you will love these: http://stackoverflow.com/questions/5624969/how-to-reference-captures-in-bash-regex-replacement#answer-22261643 =) – nickl- Mar 10 '14 at 10:03
`=~` is the key. But a bit clunky, given the reassignment in the loop. @jheddings solution 2 years prior is another good option - calling sed or perl). – Brent Faust Jun 11 '15 at 17:16
4

Calling `sed` or `perl` is sensible, if using each invocation to process more than a single line of input. Invoking such a tool on the inside of a loop, as opposed to using a loop to process its output stream, is foolhardy. – Charles Duffy Jun 14 '15 at 13:59
3

FYI, in zsh, it's just `$match` instead of `$BASH_REMATCH`. (You can make it behave like bash with `setopt bash_rematch`.) – Marian May 03 '17 at 00:14
1

It's odd -- inasmuch as zsh isn't trying to be a POSIX shell, it's arguably following the letter of POSIX guidance about all-caps variables being used for POSIX-specified (shell or system-relevant) purposes and lowercase variables being reserved for application use. But inasmuch as zsh is something that *runs* applications, rather than an application itself, this decision to use application variable namespace rather than the system namespace seems awfully perverse. – Charles Duffy Oct 16 '17 at 21:16
@CharlesDuffy your answer would have been more appreciable if you could add a bit of explanation – Orar Mar 02 '18 at 05:53
Any way to get $re in this example to match a newline? Including $'\n' doesn't seem to work as it would elsewhere. – Alex Jansen Apr 27 '19 at 00:49
@AlexJohnson, ...if you ask a separate question about that with a reproducer, could you link me in? `$'\n'` works for me (tm). – Charles Duffy Apr 27 '19 at 02:27
Looks like it just needed lots of wrapping: pattern='(['$'\n''])'; [[ "$some_var" =~ $pattern ]] – Alex Jansen Apr 27 '19 at 03:16
@AlexJohnson You don't need all that. `pattern='(['$'\n''])'` is just a needlessly-verbose way to write `pattern=$'([\n])'`. Not that I'm sure why you have the parens or the brace expression either; `[[ $some_var =~ $'\n' ]]` works fine as-is. – Charles Duffy Apr 27 '19 at 15:04
@CharlesDuffy Tell that to *`spellcheck(1)`*. Or scripts that wish to have fewer dependencies. I would agree otherwise. – Pryftan Oct 19 '19 at 14:10
@Pryftan, what's that `spellcheck`? Because [`shellcheck`](https://shellcheck.net/) has no problem with `[[ $some_var =~ $'\n' ]]` -- which has no dependencies other than bash 3.2 or later; even Apple, shipping an ancient pre-GPLv3 release, provides that. – Charles Duffy Oct 19 '19 at 15:10
@CharlesDuffy Sigh. I could have sworn I fixed the ruddy auto 'correction'. Yes, I meant shellcheck. Sorry about that. Anyway what I meant: Using sed it will complain in some cases anyway. – Pryftan Oct 22 '19 at 18:17
@CharlesDuffy Is there an elegant way to deal with endless loops due to empty matches, e.g. `hello="abc123def457ghi890jkl"; re="([0-9]*)"`? – Fonic Dec 22 '22 at 21:04
1

@Fonic, well, the _easy_ (and generally more-correct) thing is to change your re so it can't do that; `re='(.*)([0-9]+)(.*)'` and it's moot. But having a group to match the content being removed and adding `&& [[ ${BASH_REMATCH[2]} ]]` to the while loop's conditions so it exits on a zero-length match in a group corresponding with the content being removed is an alternative. – Charles Duffy Dec 23 '22 at 14:29
@CharlesDuffy I went with a different approach as I don't have any knowledge about the input string or the regular expression in my use case (both are arbitrary and user-provided). I might post it as an answer, although it is overkill regarding the OPs original question and thus a bit out of scope. – Fonic Dec 23 '22 at 20:27

nickl- · Answer 3 · 2014-03-10T09:54:47.040

142

These examples also work in bash no need to use sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

you can also use the character class bracket expressions

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

output

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.

These don't work:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}


#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'

output with all literal "d" characters removed

ho02123ware38384you44334o3434ingto38384ay

but the following does work as expected

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'

output

howareyoudoingtodday

Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'

On most flavours of *nix you will only see the following output:

d
d
d

nJoy!

edited Mar 10 '14 at 09:54

answered Mar 07 '14 at 21:48

nickl-

8,417
4
42
56

6

Pardon? `${foo//$bar/$baz}` is **not** POSIX.2 BRE or ERE syntax -- it's fnmatch()-style pattern matching. – Charles Duffy Mar 07 '14 at 21:52
8

...so, whereas `${hello//[[:digit:]]/}` works, if we wanted to filter out only digits preceded by the letter `o`, `${hello//o[[:digit:]]*}` would have an entirely different behavior than the one expected (since in fnmatch patterns, `*` matches all characters, rather than modifying the immediately prior item to be 0-or-more). – Charles Duffy Mar 07 '14 at 22:01
2

See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_03 (and all that it incorporates by reference) for the full spec on fnmatch. – Charles Duffy Mar 07 '14 at 22:02
The point it was trying to get across is that it is not PCRE, thank you for the info will investigate. – nickl- Mar 10 '14 at 04:22
1

man bash: An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). – nickl- Mar 10 '14 at 04:22
yes, `[[ $foo =~ $bar ]]` (which, you may note, I used in my answer) is ERE, but `${foo//$bar/$baz}` is not. – Charles Duffy Mar 10 '14 at 04:31
I trust the update you will find in order. My apologies I did not mean to offend... – nickl- Mar 10 '14 at 09:59
Doesn't "\d" specify "digit"s? Why is that picking "d"s? Is it because it's not "PCRE" flavor of the Regex? – aderchox Jul 16 '19 at 07:44
1

@aderchox you are correct, for digits you can use `[0-9]` or `[[:digit:]]` – nickl- Jul 17 '19 at 13:41
Much prefer this answer since I would use it for something like. for file in *.txt; do mv "$file" "${file// /_}"; done; To take the spaces out of all the file names in a directory, which would be a lot more difficult to understand if piped through sed/awk/perl – Ronald Duncan Jul 27 '22 at 19:28

score 14 · Answer 4 · answered Jan 05 '17 at 21:32

14

If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X

P1=$(date +%s)

for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done

P2=$(date +%s)
echo $[$P2-$P1]

for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done

P3=$(date +%s)
echo $[$P3-$P2]

answered Jan 05 '17 at 21:32

Josiah DeWitt

1,594
13
15

1

If you're interested in way for reducing forks, search for the word ***newConnector*** in [this answer to *How to set a variable to the output of a command in Bash?*](https://stackoverflow.com/a/41236640/1765658) – F. Hauri - Give Up GitHub Mar 30 '19 at 07:53

score 13 · Answer 5 · edited May 23 '17 at 12:02

13

Use [[:digit:]] (note the double brackets) as the pattern:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).

edited May 23 '17 at 12:02

Community

1
1

answered Aug 30 '16 at 02:25

yegeniy

1,272
13
28

Dabe Murphy · Answer 6 · 2020-07-24T18:23:50.663

I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub that I put together, which adds support for multiple $1, $2, etc. backreferences...

#!/usr/bin/env bash

############################################
###  resub - regex substitution in bash  ###
############################################

resub() {
    local match="$1" subst="$2" tmp

    if [[ -z $match ]]; then
        echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
        return 1
    fi

    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...

    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }

    tmp=""
    while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
        tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"

    ### Now start (globally) substituting

    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}

resub "$@"

##################
###  EXAMPLES  ###
##################

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog

###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog

###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five

###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five

###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

H/T to @Charles Duffy re: (.*)$match(.*)

score 1 · Answer 7 · answered Aug 12 '21 at 14:01

1

Set the var

hello=ho02123ware38384you443d34o3434ingtod38384day

then, echo with regex replacement on var

echo ${hello//[[:digit:]]/}

and this will print:

howareyoudoingtodday

Extra - if you'd like the opposite (to get the digit characters)

echo ${hello//[![:digit:]]/}

and this will print:

021233838444334343438384

answered Aug 12 '21 at 14:01

Vladimir Djuricic

4,323
1
21
22

That's pretty much the same code as the question. You're missing the part about how "the `pattern` field doesn't seem to support full regex syntax (if I use `.` or `\s`, for example, it tries to match the literal characters)." – You can't do `echo ${hello//[[:digit:]\s]/}` for example. – Adam Katz Apr 20 '22 at 03:20
@AdamKatz yeah, no biggie, it happens. Thx – Vladimir Djuricic Apr 22 '22 at 20:51

score 0 · Answer 8 · answered Nov 14 '20 at 16:43

This example in the input hello ugly world it searches for the regex bad|ugly and replaces it with nice

#!/bin/bash

# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice

  # REGEX
  re="(.*?)($2)(.*)"

  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    

    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}

# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'

# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit

score -4 · Answer 9 · edited Jan 06 '22 at 23:45

You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.

apply on file

The following pythonscript will replace "FROM" (but not "notFrom") with "TO".

regex_replace.py

import sys
import re

for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

You can apply that on a text file, like

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla

bla  notFROM FROM

bla FROM
bla bla


$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla

bla  notFROM TO

bla TO
bla bla

apply on variable

#!/bin/bash

hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello

PYTHON_CODE=$(cat <<END
import sys
import re

for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

output

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday

I'm downvoting this because I searched for "using regular expressions in Bash." Python won't help me to set my PS1 prompt (afaik). — Slothario, Jun 25 '22 at 21:40

Search and replace in bash using regular expressions

9 Answers9

apply on file

apply on variable

Linked

Related