3

I have been trying to extract part of string in bash. I'm using it on Mac.

Pattern of input string:

  • Some random word follow by a /. This is optional.
  • Keyword (def, foo, and bar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers
  • These numbers are followed by hyphens again and few hyphen separated words.

Sample inputs and outputs:

abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345

So I tried following command to fetch it but for some weird reason, it returns entire string.

extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`

I also tried to make it case-insensitive using I flag but it threw error for me:

: bad flag in substitute command: 'I'


Following are the references I tried:

oguz ismail
  • 1
  • 16
  • 47
  • 69
Rajesh
  • 24,354
  • 5
  • 48
  • 79
  • 1
    `sed` doesn't support `\d` for digits, you can use `[0-9]` – Barmar Oct 06 '21 at 15:19
  • @Barmar i noticed some weird behaviour around `\d`. Hence i moved to `[^-]*`. It used to match it but always returned entire string. But I'll read more about it – Rajesh Oct 06 '21 at 15:22

3 Answers3

4

You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.

echo abc/def-1234-random-words  | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234
Barmar
  • 741,623
  • 53
  • 500
  • 612
2

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

  • (.*/){0,1}: Match a string upto / optionally at the start
  • (: Start capture group #2
    • (def|foo|bar): Match def or foo or bar
    • -: Match a -
    • [0-9]{2,6}: Match 2 to 6 digits
  • ): End capture group #2
  • -.*: Match - followed by anything till end
  • Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

  • -v IGNORECASE=1: Enable ignore case matching
  • -F /: Use / as field separator
  • match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
  • If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Could you please also add explanation? What $NF means and is this case sensitive? – Rajesh Oct 06 '21 at 15:20
  • 2
    I am going to add. Meanwhile check `sed` which will do ignore case matchig – anubhava Oct 06 '21 at 15:23
  • Weird thing is, sed approach is still throwing this error: **: bad flag in substitute command: 'I'**. Is it environment specific? I'm using ZSH over Mac terminal – Rajesh Oct 06 '21 at 16:01
  • 1
    Yes as I mentioned that requires gnu sed. `sed` on Mac is BSD and that doesn't support `/I`. I am also on Mac but have gnu sed installed using `home brew` – anubhava Oct 06 '21 at 16:03
0

Use grep with the --only-matching option (shorthand -o).

grep --only-matching --extended-regexp '(foo|bar|def)-[0-9]{2,6}' <<EOF
abc/def-1234-random-words
bla/foo-12-random-words
bar-12345-random-words
EOF
N1ngu
  • 2,862
  • 17
  • 35