0

I'm writing a Bash script that needs to scan for the existence of non-ASCII characters in filenames. I'm using the POSIX bracket regex syntax to match the non-ASCII characters, but for some reason, when I test for the match in an if/then statement, the test always returns an Exit Status of 2, and never matches my test string.

Here's the code in question:

FILEREQ_SOURCEFILE="Filename–WithNonASćII-Charàcters-05sec_23.98.mov"
REGEX_MATCH_NONASCII="[^[:ascii:]]"

if [[ $FILEREQ_SOURCEFILE =~ $REGEX_MATCH_NONASCII ]]; then

        echo "Exit Status: $?"
        echo "Matched!"

    else
        echo "Exit Status: $?"
        echo "No Match"
fi

This code always returns:

Exit Status: 2
No Match

I've read and re-read the bash-hackers.org explanation of how regex matching works, as well as this previous question on SO regarding matching non-ASCII characters, but for the life of me, I can't get this to work. What am I missing here?

I'm running this under Bash 3.2, on Mac OS X 10.9.2.

Community
  • 1
  • 1
  • Which character is non ASCII in your filename? – Basilevs May 27 '14 at 03:24
  • The first "–" (it's an en-dash, not a hyphen), the "ć" and the "à". – PreservedMoose May 27 '14 at 03:28
  • 1
    The funny thing is that if I simplify this by changing the regex to match on *any* ASCII character ([[:ascii:]], instead of [^[:ascii:]]), and change $FILEREQ_SOURCEFILE to something simple like "Filename.mov", I still get the same results. – PreservedMoose May 27 '14 at 03:31

1 Answers1

5

From the bash(1) man page, SHELL GRAMMAR section, Compound Commands subsection, [[ expression ]] subsubsection:

If the regular expression is syntactically incorrect, the conditional expression's return value is 2.

From the regex(7) man page:

Standard character class names are:

          alnum   digit   punct
          alpha   graph   space
          blank   lower   upper
          cntrl   print   xdigit

There is no "ascii" in there. Perhaps you should try [\0-\x7f] instead (or [^\0-\x7f] as the case may be).

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • +1; quibble: on OS X, you must use `man re_format` (`man 7 regex` won't work, and `man regex` will show you `regex(3)`). – mklement0 May 27 '14 at 05:11