2

I would like to replace a character with another character in a string but only when the character occurs within a delimited substring of the string. For example, for the string:

b[b]abc[abc]bbb[bbb]

I would like to change "b" to "x" but only if it is within square brackets "[...]". Thus, the desired result is the string:

b[x]abc[axc]bbb[xxx]

My preference would be a sed or bash solution because they are in my comfort zone, but any solution that would work for Mac OS X would be fine. From searching, it seems that this can be accomplished with sed using negative lookahead and negative lookbehind, but I don't believe those features are available on the Mac version of sed.

anubhava
  • 761,203
  • 64
  • 569
  • 643
scolfax
  • 710
  • 2
  • 6
  • 17

7 Answers7

2

With GNU sed :

$ sed -r ':a;s/(\[[^]]*)b/\1x/;ta' <<< "b[b]abc[abc]bbb[bbb]"
b[x]abc[axc]bbb[xxx]
  • :a adds a label for upcoming loop
  • s : substitute command
  • (\[[^]]*) : search and capture a [ followed by any non-] character
  • until b is found
  • matching string is replaced with initially captured string and a x
  • ta : if previous substitution succeed, loops to label :a (replace any other occurrence of b)

For GNU sed on OS X :

brew uninstall gnu-sed

For more : How to use GNU sed on Mac OS X

Community
  • 1
  • 1
SLePort
  • 15,211
  • 3
  • 34
  • 44
1

This is a (rather brute-force) pure Bash solution:

raw='b[b]abc[abc]bbb[bbb]'
cooked=

declare -r delimited_rx='^(.*)\[([^][]*)\](.*)$'

while [[ $raw =~ $delimited_rx ]] ; do
    raw=${BASH_REMATCH[1]}
    printf -v cooked '[%s]%s%s' \
        "${BASH_REMATCH[2]//b/x}" \
        "${BASH_REMATCH[3]}" \
        "$cooked"
done

cooked=$raw$cooked

printf '%s\n' "$cooked"
pjh
  • 6,388
  • 2
  • 16
  • 17
  • Not a bad idea, but as coded it's limited to three bracketed matches. If you break out the processing of the string into a small single match regex in the loop you can check `${#BASH_REMATCH[@]}` to see if you still have any substitutions left and then parse any string. –  Apr 11 '16 at 23:39
  • @A.Danischewski, I don't understand what you mean by "three bracketed matches". Can you provide an example input string for which the code does not work? – pjh Apr 11 '16 at 23:51
  • Actually it looks like your code works fine, since it already matches the last bracketed match and continues looping. –  Apr 12 '16 at 00:10
1

Since "any solution that would work for Mac OS X would be fine", consider Perl:

perl -ple 's{\[([^][]*)\]}{ ($m=$1)=~s/b/x/g; "[$m]" }eg' <<< 'b[b]abc[abc]bbb[bbb]'
pjh
  • 6,388
  • 2
  • 16
  • 17
0

Using gnu-awk:

s='b[b]abc[abc]bbb[bbb]'
awk -v OFS= -v FPAT='\\[[^]]+\\]|[^[]*' '{
   for (i=1; i<=NF; i++) if ($i ~ /\[.*\]/) gsub(/b/, "x", $i)} 1' <<< "$s"

Output:

b[x]abc[axc]bbb[xxx]

On OSX I've gnu-awk installed using home brew.

anubhava
  • 761,203
  • 64
  • 569
  • 643
0
$ awk '{ while(match($0,/\[[^][]*b[^][]*\]/)) { tgt=substr($0,RSTART,RLENGTH); gsub(/b/,"x",tgt); $0=substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH) } } 1' file
b[x]abc[axc]bbb[xxx]
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Thank you for the awesome solutions! All the solutions (sed, awk, and bash) work perfectly on my system. Since I'm a bit partial to sed, I find the sed solution with the t command and looping to be very nice. It needed to be modified slightly, namely by replacing ; with linefeeds, and replacing the -r option with -E, to get it to work on my OS X system:

sed -E '
:a
s/(\[[^]]*)b/\1x/
ta
' <<< "b[b]abc[abc]bbb[bbb]"

b[x]abc[axc]bbb[xxx]

I made one other modification that would assure that the substitution takes place only if a closing square bracket accompanies an opening square bracket:

sed -E '
:a
s/(\[[^]]*)b([^]]*\])/\1x\2/
ta
' <<< "b[b]abc[abc]bbb[bbb]bbb[bbb"

b[x]abc[axc]bbb[xxx]bbb[bbb
scolfax
  • 710
  • 2
  • 6
  • 17
  • The `sed` solution is nice (and it would also be nice to accept the answer that suggested it), but it doesn't generalize very well. For instance, try using it to double the delimited b characters (i.e. substitute 'bb' instead of 'x'). The full rescan that it does on each iteration could also cause a performance problem if it is applied to large input strings. – pjh Apr 11 '16 at 22:38
  • Thank you for pointing out the drawbacks to the sed approach, which includes infinite looping with a 'bb' substitution string! Your bash and perl methods, on the other hand, handled a 'bb' substitution string properly. Would you recommend one over the other as a general solution? – scolfax Apr 12 '16 at 03:42
  • @pjh Maybe i missed something but the first intention was to replace a character `b`with another character `x`, not to double `b` -> `bb`. – SLePort Apr 12 '16 at 08:01
  • @Kenavoz, you are correct: the OP only asked for a solution that substitutes 'x' for 'b' within delimiters, and the `sed` solution (which I have upvoted) does that perfectly. I was just warning the OP to be careful trying to apply the same approach to similar problems. – pjh Apr 12 '16 at 11:32
  • @scolfax, if you are looking for an approach that generalizes, then both the Bash and Perl solutions, and several of the Awk solutions, are worth considering. The Bash solution has the advantage of not creating a subprocess, so it may be best if you are doing substitutions on small strings as part of a larger Bash program. If you are doing substitutions on large strings, or developing a standalone program to do this, I'd use Perl, but that's just a personal preference. – pjh Apr 12 '16 at 11:42
  • @pjh Thanks for the clarification. So i wouldn't call this a drawback. If the question was about doubling characters, i would have chosen different approach. That said you're right about performance with lines containing many thousands characters. – SLePort Apr 12 '16 at 11:51
  • 1
    Very sorry, my mistake in word selection. I did ask for substituting with only a single character, and thus for the question asked, I do prefer the sed approach among the several effective solutions presented (and accepted it as the answer). I also appreciate the guidance on when to use the Bash vs Perl approach, which I would use for multi-character substitution strings. – scolfax Apr 12 '16 at 13:19
0
echo 'b[b]abc[abc]bbb[bbb]' | awk -vRS='[][]' 'NR%2==0{gsub("b","x")}{printf $0 RT}'
b[x]abc[axc]bbb[xxx]
bian
  • 1,456
  • 8
  • 7
  • In a follow-up, the OP wrote that the solution must "assure that the substitution takes place only if a closing square bracket accompanies an opening square bracket". This solution does not do that. In fact, it makes no attempt to match brackets at all (e.g. `]bbb]` is converted to `]xxx]`). – pjh Apr 12 '16 at 11:26