34

I'm trying to check if a string matches this format:

10_06_13

i.e. todays date, or a similar date with "2digits_2digits_2digits".

What I've done:

regex='([0-9][0-9][_][0-9][0-9][_][0-9][0-9])'
if [[ "$incoming_string" =~ $regex ]]
then
   # Do awesome stuff here
fi

This works to a certain extent. But when the incoming string equals 011_100_131, it still passes the regex check. How can I fix my regex to only accept the right format?

honk
  • 9,137
  • 11
  • 75
  • 83
Robbie
  • 620
  • 1
  • 5
  • 17
  • 1
    Note that the underscores don't need to be in square brackets. `_` matches the same thing as `[_]`. – chepner Jun 10 '13 at 22:53
  • 2
    011_100_131 would not match with your regex. 011_10_131 would. –  Sep 18 '13 at 12:08

1 Answers1

53

=~ succeeds if the string on the left contains a match for the regex on the right. If you want to know if the string matches the regex, you need to "anchor" the regex on both sides, like this:

regex='^[0-9][0-9][_][0-9][0-9][_][0-9][0-9]$'
if [[ $incoming_string =~ $regex ]]
then
  # Do awesome stuff here
fi

The ^ only succeeds at the beginning of the string, and the $ only succeeds at the end.

Notes:

  1. I removed the unnecessary () from the regex and "" from the [[ ... ]].
  2. The bash manual is poorly worded, since it says that =~ succeeds if the string matches.
rici
  • 234,347
  • 28
  • 237
  • 341
  • Damn, I was so close! I presumed '^' was to exclude characters. Thank you very much! :) – Robbie Jun 10 '13 at 16:20
  • 10
    @Robbie: `^` means "excluding" when it is the first thing in a character set (`[...]`), and it means "anchored" when it is the first thing in a pattern. Otherwise, it just matches `^` (but that's not true in all regex implementations; sometimes it means "match the beginning of a line"). I agree that it's confusing until you get used to it. – rici Jun 10 '13 at 16:23
  • 1
    as mentioned above, you could replace `[_]` with `_` without changing what the regex matches. –  Sep 18 '13 at 12:09