2

In a Bash script I'm writing, I need to capture the /path/to/my/file.c and 93 in this line:

0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).
0xffffffc0006e0584 is in another_function(char *arg1, int arg2)  (/path/to/my/other_file.c:94).

With the help of regex101.com, I've managed to create this Perl regex:

^(?:\S+\s){1,5}\((\S+):(\d+)\)

but I hear that Bash doesn't understand \d or ?:, so I came up with this:

^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)

But when I try it out:

line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[0]}

I don't get any match. What am I doing wrong? How can I write a Bash-compatible regex to do this?

maindoor
  • 67
  • 5
  • Try it like this `^([[:alnum:]_]+[[:space:]]){1,5}\(((\/[[:alpha:]]+)+)\.[[:alpha:]]:([[:digit:]]+)\)\.$` https://regex101.com/r/Lra9ue/1 The values are in group 2 and 4 `echo ${BASH_REMATCH[2]}` and `echo ${BASH_REMATCH[4]}` – The fourth bird Sep 04 '19 at 23:15
  • Thanks @Thefourthbird, any reason why you had to partition the filename and the extension ? – maindoor Sep 05 '19 at 00:01
  • That is because the repetition of `/` and 1+ alpha chars in a group to match a path like structure. The last repetition of the group will hold the match, in this case `/file` – The fourth bird Sep 05 '19 at 00:12

2 Answers2

2

In the first pattern you use \S+ which matches a non whitespace char. That is a broad match and will also match for example / which is not taken into account in the second pattern.

The pattern starts with [:alpha:] but the first char is a 0. You could use [:alnum:] instead. Since the repetition should also match _ that could be added as well.

Note that when using a quantifier for a capturing group, the group captures the last value of the iteration. So when using {1,5} you use that quantifier only for the repetition. Its value would be some_function

You might use:

^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$

Regex demo | Bash demo

Your code could look like

line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[4]}

Result

/path/to/my/file.c
93

Or a bit shorter version using \S and the values are in group 2 and 3

^([[:alnum:]_]+[[:space:]]){1,5}\((\S+\.[[:alpha:]]):([[:digit:]]+)\)\.$

Explanation

  • ^ Start of string
  • ([[:alnum:]_]+[[:space:]]){1,5} Repeat 1-5 times what is captured in group 1
  • \( match (
  • (\S+\.[[:alpha:]]) Capture group 2 Match 1+ non whitespace chars, . and an alphabetic character
  • : Match :
  • ([[:digit:]]+) Capture group 3 Match 1+ digits
  • \)\. Match ).
  • $ End of string

See this page about bracket expressions

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
2

You are right, Bash uses POSIX ERE and does not support \d shorthand character class, nor does it support non-capturing groups. See more regex features unsupported in POSIX ERE/BRE in this post.

Use

.*\((.+):([0-9]+)\)

Or even (if you need to grab the first (...) substring in a string):

\(([^()]+):([0-9]+)\)

Details

  • .* - any 0+ chars, as many as possible (may be omitted, only necessary if there are other (...) substrings and you only need to grab the last one)
  • \( - a ( char
  • (.+) - Group 1 (${BASH_REMATCH[1]}): any 1+ chars as many as possible
  • : - a colon
  • ([0-9]+) - Group 2 (${BASH_REMATCH[2]}): 1+ digits
  • \) - a ) char.

See the Bash demo (or this one):

test='0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).'
reg='.*\((.+):([0-9]+)\)'
# reg='\(([^()]+):([0-9]+)\)' # This also works for the current scenario
if [[ $test =~ $reg ]]; then
    echo ${BASH_REMATCH[1]};
    echo ${BASH_REMATCH[2]};
fi

Output:

/path/to/my/file.c
93
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563