Replace the whole word with AWK

Question

In the following example:

echo "manoeuvre man track" | awk -v replace="man" '{ gsub(replace, ""); print }'

I get this result:

oeuvre  track

I would like it to replace text only when it finds the whole word, so as a result, I would get:

manoeuvre  track

(only the word man was removed)

--

I tried this with word boundaries (\b and \y), but I cannot understand how to apply them here.
I made it work with a for loop, but I thought maybe there is a more straightforward way using gsub?
I only have AWK and don't have gAWK.

# awk --version
awk version 20200816

Can you use sed? if you can use sed take a look to this [link](https://stackoverflow.com/questions/1032023/sed-whole-word-search-and-replace) — tia.milani, Mar 29 '22 at 16:01
What is your definition of _word_? Letters separated by blanks? Is there any punctuation? — Fravadona, Mar 29 '22 at 16:31
@anubhava I voted to reopen. This is more about _replacing_ than it is _matching_. — glenn jackman, Mar 29 '22 at 16:38
You can not use `\y` if you do not have gAWK, as it is one of [`gawk`-Specific Regexp Operators](https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html) — Daweo, Mar 29 '22 at 18:54
Also what do you mean by "only have AWK"? Can you scry version using `awk --version` and add it to your question? — Daweo, Mar 29 '22 at 19:00

Ed Morton · Answer 1 · 2022-03-29T17:35:45.170

4

This may be what you're trying to do, using any awk, assuming you want the first full-field literal string match:

$ echo "manoeuvre man track" |
    awk -v replace="man" '
        s=index(" "$0" "," "replace" ") {
            $0 = substr($0,1,s-2) substr($0,s+length(replace))
        }
        { print }
    '
manoeuvre track

or this if you want to do a full-field regexp match for all fields:

$ echo "manoeuvre man track" |
    awk -v replace="man" '
        {
            $0=" "$0" "
            gsub(" "replace" "," ")
            gsub(/^ | $/,"",$0)
            print
        }
    '
manoeuvre track

There are lots of other possibilities...

edited Mar 29 '22 at 17:35

answered Mar 29 '22 at 17:27

Ed Morton

188,023
17
78
185

@anubhava I said "assuming you want the first..." – Ed Morton Mar 29 '22 at 17:30
and it does so. It replaces the whole first word while not replacing parts of words as the OP demonstrated was a problem. – Ed Morton Mar 29 '22 at 17:32
May be I misunderstood but I thought OP wants to replace all the whole words in a line – anubhava Mar 29 '22 at 17:33
1

OK, I added another version that may also be what the OP wants It's hard to tell from the question. – Ed Morton Mar 29 '22 at 17:36
++ to this answer but unfortunately it will miss 2nd `man` in `manoeuvre man man track man` – anubhava Mar 29 '22 at 17:40

score 2 · Answer 2 · answered Mar 29 '22 at 18:15

2

Building on to the Ed's solution to address multiple search word appearing next to each other:

cat file
manoeuvre man man man mantrack man

awk -v replace="man" '
{
   $0 = " " $0 " "                # pad space on each side of the line
   gsub("( " replace ")+ ", " ")  # replace 1+ repetitions of man with a space
   gsub(/^ | $/, "")              # remove space from both sides of the line
}
1' file

manoeuvre mantrack

answered Mar 29 '22 at 18:15

anubhava

761,203
64
569
643

1

Thanks! That's a good trick there with adding extra spaces in the beginning so that the pattern in gsub would match. – Misha Slyusarev Mar 29 '22 at 20:25

glenn jackman · Answer 3 · 2022-03-29T16:54:36.320

1

"Vanilla" awk is not a good fit for this task. I'd use perl:

$ echo "manoeuvre man track" |perl -pe 's/\bman\b//g'
manoeuvre  track

or if you want to pass "man" as a parameter:

$ echo "manoeuvre man track" |perl -spe 's/\b${word}\b//g' -- -word=man
manoeuvre  track

One last point: if you want the "word" treated as a literal string, use the \Q...\E regex markers:

$ echo "manoeuvre man m.n trackm.n" |perl -spe 's/\b${word}\b//g' -- -word=m.n
manoeuvre   trackm.n

$ echo "manoeuvre man m.n trackm.n" |perl -spe 's/\b\Q${word}\E\b//g' -- -word=m.n
manoeuvre man  trackm.n

edited Mar 29 '22 at 16:54

answered Mar 29 '22 at 16:18

glenn jackman

238,783
38
220
352

Just FYI, word boundaries do not help much if the *word* can contain special chars (like e.g. `.`) at the start and end. – Wiktor Stribiżew Mar 29 '22 at 19:21

Wiktor Stribiżew · Answer 4 · 2022-03-29T20:58:52.147

Since you cannot use a GNU awk, you CAN'T use word boundaries. You would need a tool that does. For example, Perl.

If you can use Perl, you can use a solution like Glenn's one, but I'd use a but modified approach:

#!/bin/bash
text="manoeuvre man man trackman"
replace="man"
perl -spe 's/\s*(?!\B\w)\Q${replace}\E(?<!\w\B)//g' -- -replace="$replace" <<< "$text"
## => manoeuvre trackman

See this online demo.

Details:

s - allows using custom switches
p - reads the file line by line
e - signals the end of options, and the next thing will be the command
\s*(?!\B\w)\Q${replace}\E(?<!\w\B) - a regex that matches
- \s* - one or more whitespaces
- (?!\B\w) - a left-hand adaptive dynamic word boundary
- \Q${replace}\E - a replace "word" (that can be any string with any special and non-special chars) where \Q and \E quote (escape) all special regex meta chars automatically (as they are treated as literal chars)
- (?<!\w\B) - a right-hand adaptive dynamic word boundary.

Note that adaptive word boundaries will work best in cases when you do not know if the "word" one passes to the regex pattern starts or ends with special (i.e. "non-word") characters.

If you had a GNU awk, you could use something like:

awk -v replace="man" '{gsub("\\<"replace"\\>", "")}1'

See the online demo:

#!/bin/bash
s="manoeuvre man track"
awk -v replace="man" '{gsub("\\<"replace"\\>", "")}1' <<< "$s"
# => manoeuvre  track

NOTE:

Word boundaries here are \< and \>, and to use a literal \ char in the awk command, it needs doubling
replace is a variable containing just word chars, so it is OK to use \< and \> word boundaries to wrap this value to match as a whole word (it would be more complicated if it contained non-word chars)
To create a pattern from a variable, you need to concatenate the values, here, it is done with "\\<"replace"\\>".
Note that print is replaced with 1 in the code above.

Thank you for your clarification on word boundaries. However, a Perl solution is not what I'm looking for. Though, it might be a good one :) — Misha Slyusarev, Apr 19 '22 at 16:43
@MishaSlyusarev My Perl solution is the only precise one in the scenario when you do not know what chars your search string consists of. See my YT video about [dynamic adaptive word boundaries](https://www.youtube.com/watch?v=ngbxagE2b68.) — Wiktor Stribiżew, Apr 19 '22 at 16:51

Paul Hodges · Answer 5 · 2022-04-01T13:59:07.750

Worst case, manually hack the likely occurrences -
(edit - Thanks, anubhava)

$: echo "man man man man man - manual man is really manners man" | 
    awk -v replace="man" '{
      while ($0~"[[:space:]]"replace"[[:space:]]") {
       gsub("[[:space:]]+"replace"[[:space:]]+"," ");
      }
      gsub("^"replace"[[:space:]]+","");
      gsub("[[:space:]]+"replace"$","");
   } 1'
- manual is really manners

As anubhava pointed out in the comments, the space-on-either-side check can be tricked by multiple consecutive occurrences of the pattern, so the scan needs to make multiple passes to be sure.
If you care to try sed it can do boundary markers and doesn't have that problem -

$:  echo "man man man man man - manual man is really manners man" | sed -E 's/\<man\>/ /g; s/  +/ /g;'
 - manual is really manners

Note that multiples at the front still leaves a leading space -you can always add one more check and clean to the pattern list.

If you have GNU awk, and maybe don't care about the double spaces -

$: echo "manual man has manners" | awk '{gsub("\\<man\\>","")}1'
manual  has manners

Be sure to double-slash the metachars. If you really want to supply the pattern as a variable on the command line, especially with GNU awk using border markers, watch out for double-parsing. You will still need double-backslash quoting if you use single-ticks, or four if you double-quote...

$: echo "manual man has manners" | awk -v replace='\\<man\\>' '{gsub(replace,"")}1'
manual has manners
$: echo "manual man has manners" | awk -v replace="\\\\<man\\\\>" '{gsub(replace,"")}1'
manual has manners

Thank you for suggestions! I think "\\<" only works in gawk and doesn't work in awk. — Misha Slyusarev, Mar 29 '22 at 20:22
You can also just provide all possible pattern cases. Edited with example. — Paul Hodges, Mar 29 '22 at 20:44
It will fail for the case of `manoeuvre man man man mantrack ma` — anubhava, Mar 29 '22 at 21:09

Replace the whole word with AWK

5 Answers5