0

I was solving this regex problem

Given a test string, s , write a RegEx that matches s under the following conditions:

s must start with Mr., Mrs., Ms., Dr. or Er.

The rest of the string must contain only one or more English alphabetic letters (upper and lowercase).

I used this pattern

Regex_Pattern = r'^(Mr|Mrs|Ms|Dr|Er)\..[A-Za-z]+$'

but it failed this test case "Ms._underscore", then I tried using this pattern

Regex_Pattern = r'^(Mr|Mrs|Ms|Dr|Er)[\..][A-Za-z]+$'

and it passed all test cases, I cannot figure out the difference.

Emma
  • 27,428
  • 11
  • 44
  • 69
omar
  • 436
  • 6
  • 18

2 Answers2

4

Here, we can visualize our expressions and check:

enter image description here

enter image description here

We can see that the difference is here, between \.. and [\..].

  • In the first one, we are saying that we must have a . followed by any char.
  • In the second, we want to just pass one of . or ., which is equal to [.].

Demo for second expression

RegEx Circuit

jex.im visualizes regular expressions:

Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    did you mean "In the second, we want to just pass one of '.' or 'any char', which is equal to [.]"? – omar May 29 '19 at 14:10
  • 1
    @omar Yes, in the second case you take _either_ since that's what a character class `[]` does -- it matches either one of what is listed inside. (While the first case requires to have both `.` and another character.) Emma -- needs to be `[\..]` in the second case, not `[..]`. – zdim May 29 '19 at 17:10
  • 1
    @omar I didn't really finish the statement I meant to make (which may or may not be obvious?) -- in the first case it matches _one_ thing only, in the second case it requires _two_. So this makes a difference for whether it matches or not the whole thing, with patterns before and after it. – zdim May 30 '19 at 17:18
1

I think you may have the two patterns reversed. The second one does not match and the first one matches:

^(Mr|Mrs|Ms|Dr|Er)\..[A-Za-z]+$
Demo 1

^(Mr|Mrs|Ms|Dr|Er)[\..][A-Za-z]+$
Demo 2

The second one uses character class [\..] which removes special meaning of any regex tokens (e.g., . within [] matches literal period and not its special meaning (any character). The first pattern matches a literal period followed by any single character other than newline. For details on this, check out the "Explanation" on the right side in the demo links above.

UPDATE: your pattern also would match Ms.underscore (no space between title and name). So consider the following pattern, which I think works better for what you're looking for:

^(Mr|Mrs|Ms|Dr|Er)\.[ _][A-Za-z]+$
Demo 3

SanV
  • 855
  • 8
  • 16
  • 1
    I was misunderstanding something, I thought "\.." as a whole was the escape for ".", I think it was just a "period" in the tutorial :). But thanks to this mistake I now know that not I do not need to escape most of special characters inside brackets. https://stackoverflow.com/questions/19976018/does-a-dot-have-to-be-escaped-in-a-character-class-square-brackets-of-a-regula – omar May 30 '19 at 16:09