1

In the following example:

echo "manoeuvre man track" | awk -v replace="man" '{ gsub(replace, ""); print }'

I get this result:

oeuvre  track

I would like it to replace text only when it finds the whole word, so as a result, I would get:

manoeuvre  track

(only the word man was removed)

--

  • I tried this with word boundaries (\b and \y), but I cannot understand how to apply them here.

  • I made it work with a for loop, but I thought maybe there is a more straightforward way using gsub?

  • I only have AWK and don't have gAWK.

# awk --version
awk version 20200816
Misha Slyusarev
  • 1,353
  • 2
  • 18
  • 45
  • 1
    Can you use sed? if you can use sed take a look to this [link](https://stackoverflow.com/questions/1032023/sed-whole-word-search-and-replace) – tia.milani Mar 29 '22 at 16:01
  • 2
    What is your definition of _word_? Letters separated by blanks? Is there any punctuation? – Fravadona Mar 29 '22 at 16:31
  • 2
    @anubhava I voted to reopen. This is more about _replacing_ than it is _matching_. – glenn jackman Mar 29 '22 at 16:38
  • ok that's fair point @glennjackman – anubhava Mar 29 '22 at 16:45
  • 1
    You can not use `\y` if you do not have gAWK, as it is one of [`gawk`-Specific Regexp Operators](https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html) – Daweo Mar 29 '22 at 18:54
  • 1
    Also what do you mean by "only have AWK"? Can you scry version using `awk --version` and add it to your question? – Daweo Mar 29 '22 at 19:00

5 Answers5

4

This may be what you're trying to do, using any awk, assuming you want the first full-field literal string match:

$ echo "manoeuvre man track" |
    awk -v replace="man" '
        s=index(" "$0" "," "replace" ") {
            $0 = substr($0,1,s-2) substr($0,s+length(replace))
        }
        { print }
    '
manoeuvre track

or this if you want to do a full-field regexp match for all fields:

$ echo "manoeuvre man track" |
    awk -v replace="man" '
        {
            $0=" "$0" "
            gsub(" "replace" "," ")
            gsub(/^ | $/,"",$0)
            print
        }
    '
manoeuvre track

There are lots of other possibilities...

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

Building on to the Ed's solution to address multiple search word appearing next to each other:

cat file
manoeuvre man man man mantrack man

awk -v replace="man" '
{
   $0 = " " $0 " "                # pad space on each side of the line
   gsub("( " replace ")+ ", " ")  # replace 1+ repetitions of man with a space
   gsub(/^ | $/, "")              # remove space from both sides of the line
}
1' file

manoeuvre mantrack
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

"Vanilla" awk is not a good fit for this task. I'd use perl:

$ echo "manoeuvre man track" |perl -pe 's/\bman\b//g'
manoeuvre  track

or if you want to pass "man" as a parameter:

$ echo "manoeuvre man track" |perl -spe 's/\b${word}\b//g' -- -word=man
manoeuvre  track

One last point: if you want the "word" treated as a literal string, use the \Q...\E regex markers:

$ echo "manoeuvre man m.n trackm.n" |perl -spe 's/\b${word}\b//g' -- -word=m.n
manoeuvre   trackm.n

$ echo "manoeuvre man m.n trackm.n" |perl -spe 's/\b\Q${word}\E\b//g' -- -word=m.n
manoeuvre man  trackm.n
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

Since you cannot use a GNU awk, you CAN'T use word boundaries. You would need a tool that does. For example, Perl.

If you can use Perl, you can use a solution like Glenn's one, but I'd use a but modified approach:

#!/bin/bash
text="manoeuvre man man trackman"
replace="man"
perl -spe 's/\s*(?!\B\w)\Q${replace}\E(?<!\w\B)//g' -- -replace="$replace" <<< "$text"
## => manoeuvre trackman

See this online demo.

Details:

  • s - allows using custom switches
  • p - reads the file line by line
  • e - signals the end of options, and the next thing will be the command
  • \s*(?!\B\w)\Q${replace}\E(?<!\w\B) - a regex that matches
    • \s* - one or more whitespaces
    • (?!\B\w) - a left-hand adaptive dynamic word boundary
    • \Q${replace}\E - a replace "word" (that can be any string with any special and non-special chars) where \Q and \E quote (escape) all special regex meta chars automatically (as they are treated as literal chars)
    • (?<!\w\B) - a right-hand adaptive dynamic word boundary.

Note that adaptive word boundaries will work best in cases when you do not know if the "word" one passes to the regex pattern starts or ends with special (i.e. "non-word") characters.

If you had a GNU awk, you could use something like:

awk -v replace="man" '{gsub("\\<"replace"\\>", "")}1'

See the online demo:

#!/bin/bash
s="manoeuvre man track"
awk -v replace="man" '{gsub("\\<"replace"\\>", "")}1' <<< "$s"
# => manoeuvre  track

NOTE:

  • Word boundaries here are \< and \>, and to use a literal \ char in the awk command, it needs doubling
  • replace is a variable containing just word chars, so it is OK to use \< and \> word boundaries to wrap this value to match as a whole word (it would be more complicated if it contained non-word chars)
  • To create a pattern from a variable, you need to concatenate the values, here, it is done with "\\<"replace"\\>".
  • Note that print is replaced with 1 in the code above.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    @Thefourthbird I added a Perl solution. – Wiktor Stribiżew Mar 29 '22 at 20:59
  • Thank you for your clarification on word boundaries. However, a Perl solution is not what I'm looking for. Though, it might be a good one :) – Misha Slyusarev Apr 19 '22 at 16:43
  • @MishaSlyusarev My Perl solution is the only precise one in the scenario when you do not know what chars your search string consists of. See my YT video about [dynamic adaptive word boundaries](https://www.youtube.com/watch?v=ngbxagE2b68.) – Wiktor Stribiżew Apr 19 '22 at 16:51
0

Worst case, manually hack the likely occurrences -
(edit - Thanks, anubhava)

$: echo "man man man man man - manual man is really manners man" | 
    awk -v replace="man" '{
      while ($0~"[[:space:]]"replace"[[:space:]]") {
       gsub("[[:space:]]+"replace"[[:space:]]+"," ");
      }
      gsub("^"replace"[[:space:]]+","");
      gsub("[[:space:]]+"replace"$","");
   } 1'
- manual is really manners

As anubhava pointed out in the comments, the space-on-either-side check can be tricked by multiple consecutive occurrences of the pattern, so the scan needs to make multiple passes to be sure.
If you care to try sed it can do boundary markers and doesn't have that problem -

$:  echo "man man man man man - manual man is really manners man" | sed -E 's/\<man\>/ /g; s/  +/ /g;'
 - manual is really manners

Note that multiples at the front still leaves a leading space -you can always add one more check and clean to the pattern list.

If you have GNU awk, and maybe don't care about the double spaces -

$: echo "manual man has manners" | awk '{gsub("\\<man\\>","")}1'
manual  has manners

Be sure to double-slash the metachars. If you really want to supply the pattern as a variable on the command line, especially with GNU awk using border markers, watch out for double-parsing. You will still need double-backslash quoting if you use single-ticks, or four if you double-quote...

$: echo "manual man has manners" | awk -v replace='\\<man\\>' '{gsub(replace,"")}1'
manual has manners
$: echo "manual man has manners" | awk -v replace="\\\\<man\\\\>" '{gsub(replace,"")}1'
manual has manners
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36