YARP. (Yup, another regex problem).
Not sure the clearest way to describe this other than concrete examples.
Sample text:
- 4444 4444 4444 4444
- 4444444444444444
- 44 44 44 44 44 44 44 44
- 4444-4444-4444-4444
- 4444 (multiple spaces) 4444 (multiple spaces) 4444 (multiple spaces) 4444
- 0.4444444444444444
- 0.4444 4444 4444 4444
I need to build a regex that will match 1, 2 and 4 only. Requirements 13-16 digits, dashes and spaces optional, but only if single space, and no more than 3 total.
This is obviously CC info search related, and I've done a ton of research, found many examples that find matches for most, all or none, but nothing that will eliminate excessive false positives like 3 and 5 above. I'm using PowerGREP 5, I've read the entire tutorial on https://www.regular-expressions.info/tutorial.html and I can not figure out how to limit the number of optional whitespaces in the overall match. ie: "1 2 3 4 5 6 7 8 9" matches just as well as "123 456 789" if i make space(s) optional. Essentially, I want the regex to end match search if more than 3 spaces/dashes are detected.
Side note: I work for a company that deals with a TON of calendar data, so grepping a huge drive with many "1 2 3 4 5 6 7 8 ..." style text strings is generating a ton of false hits, even if I take time to tailor searches to CC inclusive patterns.
Any help would be super appreciated.
The closest I've found is:
\b(?:\d[ -]*?){13,16}\b
Which grabs any 13-16 digits (allowing for a dash or space in between) as expected, but it will also match "1 2 3 4 5 6 7 8 9 10 11" which is obviously not helpful.
All inclusive CC branded regex that fails to find valid numbers if they contain spaces/dashes: (but will find UK telephone numbers, heh):
\b(?:4[0-9]{12}(?:[0-9]{3})?|(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|6(?:011|5[0-9]{2})[0-9]{12}|(?:2131|1800|35\d{3})\d{11})\b
So then I tried replacing any [0-9] character class instances above with (?:\d[ -]*?) and that will find valid CCs with dashes/spaces, but it also matches all the "1 2 3 4 5 6 7 8 9 10 11" type false positives.
I am very new to regex, so if I'm committing a huge noob error, please feel free to point me in the right direction. Thank you!
Edit:
Replacing [0-9] with (?:\d[ -]?) for just the bigger consecutive string parts seems to be pretty close to what I need. Grepped same drive as before and only got 311 matches, and all 3 positive files found, I can live with just 308 false matches, but I gotta imagine there's a better way to do this still. And it's still matching strings of 13-16 digits with more than 3 delimiters...
Current regex:
\b(?:4(?:\d[ -]?){12}(?:[0-9]{3})?|(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)(?:\d[ -]?){12}|3[47](?:\d[ -]?){13}|3(?:0[0-5]|[68][0-9])(?:\d[ -]?){11}|6(?:011|5[0-9]{2})(?:\d[ -]?){12}|(?:2131|1800|35\d{3})(?:\d[ -]?){11})\b