254

I've seen this example:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

Which follows this syntax: ${variable//pattern/replacement}

Unfortunately the pattern field doesn't seem to support full regex syntax (if I use . or \s, for example, it tries to match the literal characters).

How can I search/replace a string using full regex syntax?

zoranc
  • 2,410
  • 1
  • 21
  • 34
Lanaru
  • 9,421
  • 7
  • 38
  • 64
  • Found a related question here: http://stackoverflow.com/questions/5658085/bash-script-regular-expressions-how-to-find-and-replace-all-matches – jheddings Oct 24 '12 at 05:37
  • 3
    FYI, `\s` isn't part of standard POSIX-defined regular expression syntax (neither BRE or ERE); it's a PCRE extension, and mostly not available from shell. `[[:space:]]` is the more universal equivalent. – Charles Duffy Jul 08 '14 at 16:49
  • 2
    `\s` can be replaced by `[[:space:]]`, by the way, `.` by `?`, and extglob extensions to the baseline shell pattern language can be used for things like optional subgroups, repeated groups, and the like. – Charles Duffy Feb 05 '15 at 20:25
  • I use this in bash version 4.1.11 on Solaris... echo ${hello//[0-9]} Notice the lack of the final slash. – Daniel Liston Aug 24 '18 at 03:35

9 Answers9

230

Use sed:

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

Note that the subsequent -e's are processed in order. Also, the g flag for the expression will match all occurrences in the input.

You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and evaluation). And of course, you have the full language support of Perl to do your bidding...

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
jheddings
  • 26,717
  • 8
  • 52
  • 65
  • This only does a single replace as far as I can tell. Is there a way to have it replace all occurances of the pattern like what the code I posted does? – Lanaru Oct 24 '12 at 05:21
  • I've updated my answer to demonstrate multiple replacements as well as global pattern matching. Let me know if that helps. – jheddings Oct 24 '12 at 05:28
  • Thanks so much! Out of curiosity, why did you switch from a one line version (in your original answer) to a two-liner? – Lanaru Oct 24 '12 at 05:30
  • Is there a reason you're using an all-caps `MYVAR`? Best practice is to save all-caps for environment variables and shell built-ins, thereby avoiding namespace conflicts. – Charles Duffy Mar 09 '14 at 15:34
  • 16
    Using `sed` or other external tools is expensive due to process initialization time. I especially searched for all-bash solution, because I found using bash substitutions to be more than 3x faster than calling `sed` for each item in my loop. – rr- Oct 11 '14 at 13:36
  • 9
    @CiroSantilli六四事件法轮功纳米比亚威视, granted, that's the common wisdom, but that doen't make it wise. Yes, bash is slow no matter what -- but well-written bash that avoids subshells is literally orders of magnitude faster than bash that calls external tools for every tiny little task. Also, well-written shell scripts will benefit from faster interpreters (like ksh93, which has performance on par with awk), whereas poorly-written ones there's nothing to be done for. – Charles Duffy Aug 10 '15 at 15:11
  • In order to illustrate the expensiveness of calling external tools, here are the outputs of the `time` command on one of my script before and after removing external tools calls. **BEFORE :** `real 1m1.209s / user 0m45.342s / sys 0m13.142s` **AFTER :** `real 0m3.433s / user 0m2.519s / sys 0m0.211s` 94% faster !! – Stephan Apr 10 '20 at 07:46
  • 1
    I'd recommend perl instead of sed due to portability issues between the Linux version of sed and the BSD version that's used on, eg, MacOS. – Dave Dopson Apr 12 '21 at 03:40
  • I think it is safe to assume that if someone works on a Mac and cares about the performance of a bash script, they've probably already run `brew install coreutils` – ontek Apr 15 '21 at 16:10
  • If you're trying to save this output back into the variable you can do e.g. `MYVAR=$(echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g')` – twhitney Sep 03 '21 at 18:09
  • When assigning to a variable you should use -n to avoid echo appending a newilne, like `MYVAR=$(echo -n "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g')` – LazyProphet Oct 07 '21 at 10:27
167

This actually can be done in pure bash:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...yields...

howareyoudoingtodday
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 4
    Something tells me you will love these: http://stackoverflow.com/questions/5624969/how-to-reference-captures-in-bash-regex-replacement#answer-22261643 =) – nickl- Mar 10 '14 at 10:03
  • `=~` is the key. But a bit clunky, given the reassignment in the loop. @jheddings solution 2 years prior is another good option - calling sed or perl). – Brent Faust Jun 11 '15 at 17:16
  • 4
    Calling `sed` or `perl` is sensible, if using each invocation to process more than a single line of input. Invoking such a tool on the inside of a loop, as opposed to using a loop to process its output stream, is foolhardy. – Charles Duffy Jun 14 '15 at 13:59
  • 3
    FYI, in zsh, it's just `$match` instead of `$BASH_REMATCH`. (You can make it behave like bash with `setopt bash_rematch`.) – Marian May 03 '17 at 00:14
  • 1
    It's odd -- inasmuch as zsh isn't trying to be a POSIX shell, it's arguably following the letter of POSIX guidance about all-caps variables being used for POSIX-specified (shell or system-relevant) purposes and lowercase variables being reserved for application use. But inasmuch as zsh is something that *runs* applications, rather than an application itself, this decision to use application variable namespace rather than the system namespace seems awfully perverse. – Charles Duffy Oct 16 '17 at 21:16
  • @CharlesDuffy your answer would have been more appreciable if you could add a bit of explanation – Orar Mar 02 '18 at 05:53
  • Any way to get $re in this example to match a newline? Including $'\n' doesn't seem to work as it would elsewhere. – Alex Jansen Apr 27 '19 at 00:49
  • @AlexJohnson, ...if you ask a separate question about that with a reproducer, could you link me in? `$'\n'` works for me (tm). – Charles Duffy Apr 27 '19 at 02:27
  • Looks like it just needed lots of wrapping: pattern='(['$'\n''])'; [[ "$some_var" =~ $pattern ]] – Alex Jansen Apr 27 '19 at 03:16
  • @AlexJohnson You don't need all that. `pattern='(['$'\n''])'` is just a needlessly-verbose way to write `pattern=$'([\n])'`. Not that I'm sure why you have the parens or the brace expression either; `[[ $some_var =~ $'\n' ]]` works fine as-is. – Charles Duffy Apr 27 '19 at 15:04
  • @CharlesDuffy Tell that to *`spellcheck(1)`*. Or scripts that wish to have fewer dependencies. I would agree otherwise. – Pryftan Oct 19 '19 at 14:10
  • @Pryftan, what's that `spellcheck`? Because [`shellcheck`](https://shellcheck.net/) has no problem with `[[ $some_var =~ $'\n' ]]` -- which has no dependencies other than bash 3.2 or later; even Apple, shipping an ancient pre-GPLv3 release, provides that. – Charles Duffy Oct 19 '19 at 15:10
  • @CharlesDuffy Sigh. I could have sworn I fixed the ruddy auto 'correction'. Yes, I meant shellcheck. Sorry about that. Anyway what I meant: Using sed it will complain in some cases anyway. – Pryftan Oct 22 '19 at 18:17
  • @CharlesDuffy Is there an elegant way to deal with endless loops due to empty matches, e.g. `hello="abc123def457ghi890jkl"; re="([0-9]*)"`? – Fonic Dec 22 '22 at 21:04
  • 1
    @Fonic, well, the _easy_ (and generally more-correct) thing is to change your re so it can't do that; `re='(.*)([0-9]+)(.*)'` and it's moot. But having a group to match the content being removed and adding `&& [[ ${BASH_REMATCH[2]} ]]` to the while loop's conditions so it exits on a zero-length match in a group corresponding with the content being removed is an alternative. – Charles Duffy Dec 23 '22 at 14:29
  • @CharlesDuffy I went with a different approach as I don't have any knowledge about the input string or the regular expression in my use case (both are arbitrary and user-provided). I might post it as an answer, although it is overkill regarding the OPs original question and thus a bit out of scope. – Fonic Dec 23 '22 at 20:27
142

These examples also work in bash no need to use sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

you can also use the character class bracket expressions

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

output

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.

These don't work:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}


#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'

output with all literal "d" characters removed

ho02123ware38384you44334o3434ingto38384ay

but the following does work as expected

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'

output

howareyoudoingtodday

Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'

On most flavours of *nix you will only see the following output:

d
d
d

nJoy!

nickl-
  • 8,417
  • 4
  • 42
  • 56
  • 6
    Pardon? `${foo//$bar/$baz}` is **not** POSIX.2 BRE or ERE syntax -- it's fnmatch()-style pattern matching. – Charles Duffy Mar 07 '14 at 21:52
  • 8
    ...so, whereas `${hello//[[:digit:]]/}` works, if we wanted to filter out only digits preceded by the letter `o`, `${hello//o[[:digit:]]*}` would have an entirely different behavior than the one expected (since in fnmatch patterns, `*` matches all characters, rather than modifying the immediately prior item to be 0-or-more). – Charles Duffy Mar 07 '14 at 22:01
  • 2
    See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_03 (and all that it incorporates by reference) for the full spec on fnmatch. – Charles Duffy Mar 07 '14 at 22:02
  • The point it was trying to get across is that it is not PCRE, thank you for the info will investigate. – nickl- Mar 10 '14 at 04:22
  • 1
    man bash: An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). – nickl- Mar 10 '14 at 04:22
  • yes, `[[ $foo =~ $bar ]]` (which, you may note, I used in my answer) is ERE, but `${foo//$bar/$baz}` is not. – Charles Duffy Mar 10 '14 at 04:31
  • I trust the update you will find in order. My apologies I did not mean to offend... – nickl- Mar 10 '14 at 09:59
  • Doesn't "\d" specify "digit"s? Why is that picking "d"s? Is it because it's not "PCRE" flavor of the Regex? – aderchox Jul 16 '19 at 07:44
  • 1
    @aderchox you are correct, for digits you can use `[0-9]` or `[[:digit:]]` – nickl- Jul 17 '19 at 13:41
  • Much prefer this answer since I would use it for something like. for file in *.txt; do mv "$file" "${file// /_}"; done; To take the spaces out of all the file names in a directory, which would be a lot more difficult to understand if piped through sed/awk/perl – Ronald Duncan Jul 27 '22 at 19:28
14

If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X

P1=$(date +%s)

for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done

P2=$(date +%s)
echo $[$P2-$P1]

for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done

P3=$(date +%s)
echo $[$P3-$P2]
Josiah DeWitt
  • 1,594
  • 13
  • 15
  • 1
    If you're interested in way for reducing forks, search for the word ***newConnector*** in [this answer to *How to set a variable to the output of a command in Bash?*](https://stackoverflow.com/a/41236640/1765658) – F. Hauri - Give Up GitHub Mar 30 '19 at 07:53
13

Use [[:digit:]] (note the double brackets) as the pattern:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).

Community
  • 1
  • 1
yegeniy
  • 1,272
  • 13
  • 28
6

I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub that I put together, which adds support for multiple $1, $2, etc. backreferences...

#!/usr/bin/env bash

############################################
###  resub - regex substitution in bash  ###
############################################

resub() {
    local match="$1" subst="$2" tmp

    if [[ -z $match ]]; then
        echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
        return 1
    fi

    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...

    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }

    tmp=""
    while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
        tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"

    ### Now start (globally) substituting

    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}

resub "$@"

##################
###  EXAMPLES  ###
##################

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog

###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog

###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five

###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five

###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

H/T to @Charles Duffy re: (.*)$match(.*)

Dabe Murphy
  • 101
  • 1
  • 5
1

Set the var

hello=ho02123ware38384you443d34o3434ingtod38384day

then, echo with regex replacement on var

echo ${hello//[[:digit:]]/}

and this will print:

howareyoudoingtodday

Extra - if you'd like the opposite (to get the digit characters)

echo ${hello//[![:digit:]]/}

and this will print:

021233838444334343438384
Vladimir Djuricic
  • 4,323
  • 1
  • 21
  • 22
  • That's pretty much the same code as the question. You're missing the part about how "the `pattern` field doesn't seem to support full regex syntax (if I use `.` or `\s`, for example, it tries to match the literal characters)." – You can't do `echo ${hello//[[:digit:]\s]/}` for example. – Adam Katz Apr 20 '22 at 03:20
  • @AdamKatz yeah, no biggie, it happens. Thx – Vladimir Djuricic Apr 22 '22 at 20:51
0

This example in the input hello ugly world it searches for the regex bad|ugly and replaces it with nice

#!/bin/bash

# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice

  # REGEX
  re="(.*?)($2)(.*)"

  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    

    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}

# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'

# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit
Tono Nam
  • 34,064
  • 78
  • 298
  • 470
-4

You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.

apply on file

The following pythonscript will replace "FROM" (but not "notFrom") with "TO".

regex_replace.py

import sys
import re

for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

You can apply that on a text file, like

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla

bla  notFROM FROM

bla FROM
bla bla


$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla

bla  notFROM TO

bla TO
bla bla

apply on variable

#!/bin/bash

hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello

PYTHON_CODE=$(cat <<END
import sys
import re

for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

output

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday
Asclepius
  • 57,944
  • 17
  • 167
  • 143
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58
  • I'm downvoting this because I searched for "using regular expressions in Bash." Python won't help me to set my PS1 prompt (afaik). – Slothario Jun 25 '22 at 21:40