3

The first thing I tried to do, is get the regex matching what I DON'T want. This way, I could just flip it to NOT accept that same input. This is where I came up with the first part of this regex.

  • Accept all 9 digit numbers, where all 9 digits are identical (without dashes): "^(\d)\1{8}$". This expression works as expected (as seen here: (https://regex101.com/r/Ez8YC3/1)).
  • The second expression should do the same, with dashes formatted as follows xxx-xx-xxxx: "^(\d)\1{8}$". This expressions works as expected (as seen here: https://regex101.com/r/bodzIX/1).

Now what I want to do at this point, is combine them together to look for BOTH conditions. However when I do that it seems to break, and only match 9 digit numbers that are identical throughout WITH dashes: "^(\d)\1{2}-(\d)\1{1}-(\d)\1{3}$|^(\d)\1{8}$". This can be seen here: https://regex101.com/r/lPnksf/1.

I may be getting a little ahead of myself here, but in order to show my work as much as possible, I also tried flipping those regex separately, which also did not work as expected.

I would expect the two expressions (when flipped) to match any 9 digit number (with or without dashes) where all numbers are not identical. How ever this does not happen at all.

This is the final regex that I came up with, which is clearly not doing what I would expect it to: "^(?!(\d)\1{2}-(\d)\1{1}-(\d)\1{3})$|^(?!(\d)\1{8})$". Can be seen here: https://regex101.com/r/9eHhF5/1

At the end of the day, I want to combine these 2 expressions, with this one (that already works as intended): "^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$". Can be seen here: https://regex101.com/r/AdRI8i/1.

I am still pretty new to regex, and really want to understand why I can't simply wrap the condition in (?!...) in order to match the opposite condition.

Thank you in advance

  • There's a joke about regexes: I have a problem that I solved with a regex, now I have *two* problems. The problem is that often that's not really a joke, but the plain truth. Regular expressions can be incredibly powerful, but as you know also incredibly hard. And more often than not there are other solutions than a regex. And some things, like repetition or pair matching (think HTML open and close tags) are either impossible or need special extensions. – Some programmer dude Sep 08 '21 at 06:02
  • Also please take some time to refresh [the help pages](http://stackoverflow.com/help), take the SO [tour], read [ask], as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). And tell us what regexes you have tried, how they worked or not worked, and do it *in* the question itself. Together with examples of valid and invalid input. – Some programmer dude Sep 08 '21 at 06:03
  • Some programmer dude, I really appreciate your response! I thought I did a pretty good job at showing what I have already tried, and explaining why it didn't work. Each link in the question shows a regex expression and what it matches/doesn't match. I don't mean this rudely, but is that not enough context? – NoobProgrammer3000 Sep 08 '21 at 06:08
  • 1
    Questions should be *self contained*. What happens if those links become invalid in the future? – Some programmer dude Sep 08 '21 at 06:10
  • Ahhhhhh I see your point! I'll make some edits to fix that. Thanks, I really appreciate it! – NoobProgrammer3000 Sep 08 '21 at 06:11
  • It's okay. And I apologize if I might have come of as a little rude. That wasn't my intention. :) – Some programmer dude Sep 08 '21 at 06:12
  • 1
    Would the expressions [`^(?!(\d)(?:-?\1){8})\d{3}(?:-\d\d-|\d\d)\d{4}$`](https://regex101.com/r/K4obxi/1) work for you? What I'm not sure about is when you mention you want to combine it with the one you already had at the end of your post. Maybe you can clarify with examples? – JvdV Sep 08 '21 at 06:35

2 Answers2

1

By this Regex you match what you dont want as social security number:

^(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})$

Demo

By this regex you match only what you want:

^(?!(?:(\d)\1{8})|(?:(\d)\2{2}-\2{2}-\2{4})).*$

Demo

Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43
  • This does work mostly, but it doesnt account for this expression "^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$". Which says the first 3 digits cant be 000, 666, or any 9xx. The fourth and fifth cant be 00, and the last 4 cant be 0000. I will look more into the ?: operator for sure, as that seems to be a very useful piece to this puzzle. – NoobProgrammer3000 Sep 08 '21 at 07:04
  • @NoobProgrammer3000 [Non-capturing groups](https://stackoverflow.com/questions/3512471) are very useful indeed, however, they are not that important in this case as you may *chain* negative lookaheads. – Wiktor Stribiżew Sep 08 '21 at 07:17
1

What you want to do is not flip, but reverse the regex logic.

Yes, to reverse the pattern logic, you should use a negative lookahead, but there are caveats.

First, the $ end of string anchor: if it was at the end of the "positive" regex, it must also be moved to the lookahead in the reverse pattern. So, your ^(?!(\d)\1{8})$ regex must be written as ^(?!(\d)\1{8}$). Same goes for your second regex.

Next, mind that each subsequent capturing group gets an incremented ID number, so you cannot keep the same backreferences when you "join" patterns with OR | operator. You must adjust these IDs to reflect their new values in the new regex.

So, you want to match a string that matches ^(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d\d\d\d$ first (let's note \d\d\d\d = \d{4}), then you can add restrictions with lookaheads:

  • (?!(\d)\1{8}$) - fails the match if, immediately from the current position, it matches identical 9 digits and then the string end comes
  • (?!(\d)\2\2-(\d)\2-(\d)\2{3}$) - (note the ID incrementing continuation) fails the match if, immediately from the current position, it matches identical to the first one 3 digits, -, identical 2 digits, -, identical 5 digits, and then the string end comes.

So, to follow your logic, you can use

^(?!(\d)\1{8}$)(?!(\d)\2\2-(\d)\2-(\d)\2{3}$)(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d{4}$

See the regex demo

As the lookaheads are non-consuming patterns, i.e. the regex index remains at the same position after matching their pattern sequences where it was before, the 3 lookaheads will all be tried at the start of the string (see the ^ anchor). If any of the three negative lookaheads at the start fails, the whole string match will be failed right away.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    After writing this, I realize there is some need for clarification, as `(?!(\d)\1{8}$)` is redundant in the pattern. Do you need to match 9 digit strings with not all of them identical? If yes, use `^(?!(\d)\1{8}$)(?!(\d)\2\2-(\d)\2-(\d)\2{3}$)(?:(?!000|666|9\d\d)\d{3}-(?!00)\d\d-(?!0000)\d{4}|\d{9})$`. Else, remove `(?!(\d)\1{8}$)`. – Wiktor Stribiżew Sep 08 '21 at 07:23
  • I fell asleep around 4:30am last night and still couldnt stop thinking about this problem. You have no idea how much I appreciate your explanation here. You have made my day, and I actually learned so much by reading this. My frustration came from not understanding the proper placement of the "?" end string notation. This makes sense and works flawlessly. You are a god among men, thank you! – NoobProgrammer3000 Sep 08 '21 at 14:02
  • 1
    @NoobProgrammer3000 Same happened to me yesterday, we need to have a good rest this night :) Happy to help. – Wiktor Stribiżew Sep 08 '21 at 14:04