How to extract part of string in Bash using regex

Question

I have been trying to extract part of string in bash. I'm using it on Mac.

Pattern of input string:

Some random word follow by a /. This is optional.
Keyword (def, foo, and bar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers
These numbers are followed by hyphens again and few hyphen separated words.

Sample inputs and outputs:

abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345

So I tried following command to fetch it but for some weird reason, it returns entire string.

extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`

I also tried to make it case-insensitive using I flag but it threw error for me:

: bad flag in substitute command: 'I'

Following are the references I tried:

@Barmar i noticed some weird behaviour around `\d`. Hence i moved to `[^-]*`. It used to match it but always returned entire string. But I'll read more about it — Rajesh, Oct 06 '21 at 15:22

score 4 · Answer 1 · answered Oct 06 '21 at 15:21

4

You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.

echo abc/def-1234-random-words  | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234

answered Oct 06 '21 at 15:21

Barmar

741,623
53
500
612

This along with `gsed` for case-insensitivity flag `/I` solved my issue. Thanks a TON! – Rajesh Oct 06 '21 at 16:19

anubhava · Accepted Answer · 2021-10-06T15:26:53.473

2

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

(.*/){0,1}: Match a string upto / optionally at the start
(: Start capture group #2
- (def|foo|bar): Match def or foo or bar
- -: Match a -
- [0-9]{2,6}: Match 2 to 6 digits
): End capture group #2
-.*: Match - followed by anything till end
Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

-v IGNORECASE=1: Enable ignore case matching
-F /: Use / as field separator
match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)

edited Oct 06 '21 at 15:26

answered Oct 06 '21 at 15:18

anubhava

761,203
64
569
643

Could you please also add explanation? What $NF means and is this case sensitive? – Rajesh Oct 06 '21 at 15:20
2

I am going to add. Meanwhile check `sed` which will do ignore case matchig – anubhava Oct 06 '21 at 15:23
Weird thing is, sed approach is still throwing this error: **: bad flag in substitute command: 'I'**. Is it environment specific? I'm using ZSH over Mac terminal – Rajesh Oct 06 '21 at 16:01
1

Yes as I mentioned that requires gnu sed. `sed` on Mac is BSD and that doesn't support `/I`. I am also on Mac but have gnu sed installed using `home brew` – anubhava Oct 06 '21 at 16:03

score 0 · Answer 3 · answered Oct 24 '22 at 16:00

0

Use grep with the --only-matching option (shorthand -o).

grep --only-matching --extended-regexp '(foo|bar|def)-[0-9]{2,6}' <<EOF
abc/def-1234-random-words
bla/foo-12-random-words
bar-12345-random-words
EOF

answered Oct 24 '22 at 16:00

N1ngu

2,862
17
35

How to extract part of string in Bash using regex

Sample inputs and outputs:

3 Answers3

Linked