sed not capturing \w even with extended regexp

Question

So, I have this string 90d_ASD_A98CAWd9A_WF8, and I want to capture the \w group of characters in it and remove them. \w captures both uppercase and lowercase alphabetical characters and the underscore.

However, even though sed identifies the \w group, it doesn't replace the \w group characters in the string, even if I enable extended regexp using the -E option.

If I execute this:

echo "90d_ASD_A98CAWd9A_WF8" | sed -E s/"[\w]+"//g

I get:

90d_ASD_A98CAWd9A_WF8

However, if I explicitly mention the range of the characters I want to remove in the regexp:

echo "90d_ASD_A98CAWd9A_WF8" | sed -E s/"[A-Za-z_]+"//g

I get exactly what I'm expecting: the string with all the uppercase and lowercase alphabets and underscores removed:

Am I using the \w group wrongly? What am I doing wrong?

First of all `\w` equivalent to `[0-9a-zA-Z_]` not `[A-Za-z_]`. So if you need to remove only letters and underscore, use your second command. — markalex, Apr 22 '23 at 06:56
Also for compatibility reasons it is better to use `[[:alnum:]_]` instead of `\w` as support for meta sequences is limited. — markalex, Apr 22 '23 at 07:01
If you want to extract decimal digits only, use `sed 's/[^0-9]//g'`. Also note that not all `sed`s support `\w`. — M. Nejat Aydin, Apr 22 '23 at 07:01
@markalex understood. But in that case I should be getting an empty string in the first example, right? As all of the characters in the string `90d_ASD_A98CAWd9A_WF8` match the regex `[0-9a-zA-Z_]`. Why didn't that happen? — UrbanCentral, Apr 22 '23 at 07:04
Regarding your exact problem `sed` threats `[\w]` as any of ``\`` or `w`. I don't know why. But if your version of `sed` supports `\w` you can simply write `'s/\w+//g'` — markalex, Apr 22 '23 at 07:05
Inside `[]` `\w` is just a simple `w`. Plus, `\w` matches also digits. Try `echo "90d_ASD_A98CAWd9A_WF8" | sed -E 's/[^0-9]//g'` (also note the single quotes around the `sed` script and no double quotes in it). — Renaud Pacalet, Apr 22 '23 at 07:05
What ever is inside the `[....]` is called [bracket expression](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05) — Jetchisel, Apr 22 '23 at 07:20
Actually, `[\w]` in a POSIX regex [matches both ``\`` and `w`](https://ideone.com/7WBxQ7). The [reason](https://stackoverflow.com/a/46021796/3832970) is simple, Perl-like shorthand character classes are not treated as such inside POSIX bracket expressions. — Wiktor Stribiżew, Apr 22 '23 at 09:50

sed not capturing \w even with extended regexp

0 Answers0