4

I'm trying capture the some input regex in Bash but BASH_REMATCH comes EMPTY

#!/usr/bin/env /bin/bash
INPUT=$(cat input.txt)
TASK_NAME="MailAccountFetch"

MATCH_PATTERN="(${TASK_NAME})\s+([0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2})"

while read -r line; do
    if [[ $line =~ $MATCH_PATTERN ]]; then
        TASK_RESULT=${BASH_REMATCH[3]}
        TASK_LAST_RUN=${BASH_REMATCH[2]}
        TASK_EXECUTION_DURATION=${BASH_REMATCH[4]}
    fi
done <<< "$INPUT"

My input is:

    MailAccountFetch                         2017-03-29 19:00:00  Success      5.0 Second(s)      2017-03-29 19:03:00

By debugging the script (VS Code+Bash ext) I can see the INPUT string matches as the code goes inside the IF but BASH_REMATCH is not populated with my two capture groups.

I'm on:

GNU bash, version 4.4.0(1)-release (x86_64-pc-linux-gnu)

What could be the issue?

LATER EDIT


Accepted Answer

Accepting most explanatory answer.

What finally resolved the issue:

bashdb/VS Code environment are causing the empty BASH_REMATCH. The code works OK when ran alone.

rocky
  • 7,226
  • 3
  • 33
  • 74
Alex Culea
  • 81
  • 1
  • 7
  • Answers on the bash debug GitHub repository indicate that bash version 4.4.20 addresses this problem, BUT I tried it with 4.4 and cumulative patches through 20 and I had the same problem. Here's the link - https://github.com/rogalmic/vscode-bash-debug/issues/113 – cycollins Apr 23 '20 at 02:45

2 Answers2

5

As Cyrus shows in his answer, a simplified version of your code - with the same input - does work on Linux in principle.

That said, your code references capture groups 3 and 4, whereas your regex only defines 2.

In other words: ${BASH_REMATCH[3]} and ${BASH_REMATCH[4]} are empty by definition.

Note, however, that if =~ signals success, BASH_REMATCH is never fully empty: at the very least - in the absence of any capture groups - ${BASH_REMATCH[0]} will be defined.


There are some general points worth making:

  • Your shebang line reads #!/usr/bin/env /bin/bash, which is effectively the same as #!/bin/bash.

    • /usr/bin/env is typically used if you want a version other than /bin/bash to execute, one you've installed later and put in the PATH (too):
      #!/usr/bin/env bash

    • ghoti points out that another reason for using #!/usr/bin/env bash is to also support less common platforms such as FreeBSD, where bash, if installed, is located in /usr/local/bin rather than the usual /bin.

    • In either scenario it is less predictable which bash binary will be executed, because it depends on the effective $PATH value at the time of invocation.

  • =~ is one of the few Bash features that are platform-dependent: it uses the particular regex dialect implemented by the platform's regex libraries.

    • \s is a character class shortcut that is not available on all platforms, notably not on macOS; the POSIX-compliant equivalent is [[:space:]].

    • (In your particular case, \s should work, however, because your Bash --version output suggests that you are on a Linux distro.)

  • It's better not to use all-uppercase shell variable names such as INPUT, so as to avoid conflicts with environment variables and special shell variables.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    @Inian: Because the macOS regex libraries do not support `\s` - I've updated the answer to make that clearer. – mklement0 Mar 31 '17 at 19:05
1

Bash uses system libraries to parse regular expressions, and different parsers implement different features. You've come across a place where regex class shorthand strings do not work. Note the following:

$ s="one12345   two"
$ [[ $s =~ ^([a-z]+[0-9]{4})\S*\s+(.*) ]] && echo yep; declare -p BASH_REMATCH
declare -ar BASH_REMATCH=()
$ [[ $s =~ ^([a-z]+[0-9]{4})[^[:space:]]*[[:space:]]+(.*) ]] && echo yep; declare -p BASH_REMATCH
yep
declare -ar BASH_REMATCH=([0]="one12345   two" [1]="one1234" [2]="two")

I'm doing this on macOS as well, but I get the same behaviour on FreeBSD.

Simply replace \s with [[:space:]], \d with [[:digit:]], etc, and you should be good to go. If you avoid using RE shortcuts, your expressions will be more widely understood.

ghoti
  • 45,319
  • 8
  • 65
  • 104
  • It's a good point in general (and covered in my answer), but it's not the OP's problem, as he's running on a Linux distro, where `\s` therefore _is_ available. – mklement0 Mar 31 '17 at 19:23