0

I am learning Regular expression for bash scripting. However, when I was testing string match, the lines always can't pass. Here is what I am testing:

I have a txt file contains a list of "songs":

$ cat soundtrack.txt
Ludwig Van Beethoven - 01 - Allero.oog
Ludwig Van Beethoven - 02 - Adag.mp3
Ludwig Van Beethoven - 03 - Beach.oog
Ludwig Van Beethoven - 04 - Caven Adven.wmv

I would like to use Regex to get the "track number" which are the numerical ones.

Here's the script:

$ cat soundtrack.sh
#!/bin/bash
IFS=$'\n'
for CD in `cat soundtrack.txt`
do
    if [[ "$CD" =~ "([[:alpha:][:blank:]]*)- ([[:digit:]]*) - (.*)$" ]]
    then
        echo "Found ${BASH_REMATCH[2]}"
    fi
done 

However, the bash debug shows the string was unable to match the regex:

$ bash -x soundtrack.sh
+ IFS='
'
++ cat soundtrack.txt
+ for CD in '`cat soundtrack.txt`'
+ [[ Ludwig Van Beethoven - 01 - Allero.oog =~ \(\[\[:alpha:]\[:blank:]]\*\)- \(\[\[:digit:]]\*\) - \(\.\*\)\$ ]]
+ for CD in '`cat soundtrack.txt`'
+ [[ Ludwig Van Beethoven - 02 - Adag.mp3 =~ \(\[\[:alpha:]\[:blank:]]\*\)- \(\[\[:digit:]]\*\) - \(\.\*\)\$ ]]
+ for CD in '`cat soundtrack.txt`'
+ [[ Ludwig Van Beethoven - 03 - Beach.oog =~ \(\[\[:alpha:]\[:blank:]]\*\)- \(\[\[:digit:]]\*\) - \(\.\*\)\$ ]]
+ for CD in '`cat soundtrack.txt`'
+ [[ Ludwig Van Beethoven - 04 - Caven Adven.wmv =~ \(\[\[:alpha:]\[:blank:]]\*\)- \(\[\[:digit:]]\*\) - \(\.\*\)\$ ]]

But, if I test directly in the shell with the same expression, it works:

$ if [[ "Ludwig Van Beethoven - 01 - Allero.oog" =~ ([[:alpha:][:blank:]]*)-\ ([[:digit:]]*)\ -\ (.*)$ ]]; then echo yes; else echo no; fi
yes

What's wrong with my script? Do I have to add extra quotas or backslashes? Just doesn't make sense to me.

P.S.

$ bash --version
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
NeilWang
  • 347
  • 5
  • 13

2 Answers2

1

The catch is that these things are not the same:

[[ "$CD" =~ "([[:alpha:][:blank:]]*)- ([[:digit:]]*) - (.*)$" ]]
[[ "$CD" =~ ([[:alpha:][:blank:]]*)-\ ([[:digit:]]*)\ -\ (.*)$ ]]

The first version is how you wrote in the script, and the second is how you ran it in the shell.

That is, if you double-quote the pattern, then the regex symbols are taken literally. You cannot enclose the pattern in double-quotes.

janos
  • 120,954
  • 29
  • 226
  • 236
1

The problem is that you're quoting the regex which takes away all the special regex powers: only quote the literal bits, particularly if they are spaces. The 2nd problem is that you're using a for loop to read the file: don't do that

while IFS= read -r CD; do
    if [[ "$CD" =~ ([[:alpha:][:blank:]]*)"- "([[:digit:]]*)" - "(.*) ]]
    then
        echo "Found ${BASH_REMATCH[2]}"
    fi
done < soundtrack.txt
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thank you, it works. The syntax on the book is wrong! It quotes the entire string for the regrex like I demonstrated. – NeilWang Dec 13 '18 at 23:35
  • It's possible there are differences in older versions of bash: it may have taken a couple of releases for the syntax to settle down. – glenn jackman Dec 14 '18 at 02:54