Reg Ex is getting more digits than expected

Question

~~Dont suggest me any links , I saw all million times.~~
I looked at many suggestions - such as Regex credit card number tests. However, I'm not primarily concerned with verifying potential credit numbers.

I want to locate (potentential) credit card numbers in a document by identifying sequences of 12 to 19 numbers (plus a few common separator characters between them). This is being discussed in, e.g., Finding or Verifying Credit Card Numbers, at which @TimBiegeleisen points. But the suggested solution results in a few false negatives. (See section "Problems..." below.)

Sample input:

[ '232625427', 'please stop check 220 2000000 that was sent 6/10 reg mail and reissu fedex. Please charge to credit card 4610 0000 0000 0000 exp 05/99...thanks, Sxxx' ]
[ '232653042', 'MARKET PLACE: Exxxx or Bxxxx-Please set husband and wife up on monthly credit card payments. Name on the credit card is Hxxxx-Jxxxx Lxxxx (Maiden name, name on policy is different) Master card number 5424 0000 0000 0000 Exp 11-30-00. Thanks so much.' ]

Much more sample input at my RegEx101.com attempt.

My regex is

[1-9](\d[ ]?[ ]*?[-]?[-]*?[:]*?[:]?){11,18}\b

Problems with my RegEx

The 12-19 digit numbers are not matched when immediately followed by a string. It fails, e.g., on 4554-4545-4545-4545Visa.
Longer running sequences of numbers are matched at the end rather than the beginning: For 999999999999994190000000000000 I do get 9994190000000000000 instead of 9999999999999941900

I am testing it at RegEx101.com.

Also read this: https://regular-expressions.mobi/creditcard.html?wlr=1 — Tim Biegeleisen, Dec 30 '17 at 12:18
@TimBiegeleisen I already saw it million times. please follow the link and check the problem. I already implemented simpler solution than declaring regex for each card issuer. But there it is not working in cases I mentioned above. If you follow that link check the 1 and 2 line. You see my examples. Thank you — Mukhammad Ali, Dec 30 '17 at 12:32
The problems you state are primarily due to the trailing word boundary `\b`. Removing it will, however, probably change a few other matches. You would need to indicate, which of these changes need to be addressed. On a different account: You could simplify your current regex to `[1-9](\d[ ]*[-]*[:]*){11,18}\b` and even to `[1-9](\d[- :]*){11,18}\b` - no changes in matches for the input in [your regex](https://regex101.com/r/J2DgIq/6). (Mind in the latter, the dash moved to the start of the list of allowed characters - otherwise it would signify the range of characters between blank and colon. — Abecee, Dec 30 '17 at 23:19

score 1 · Accepted Answer · answered Dec 31 '17 at 06:56

To address the problem in your title "Reg Ex is getting more digits than expected" (reading "digits" as "characters", though), try:

[1-9]([- :]*\d){11,18}\b

This way, you no longer match trailing blanks in your sample input. See it in action at RegEx101.com.

Closer to what you pointed out under "Problems..." should be:

[1-9]([- :]*\d){11,18}

With the word boundary removed from the end, strings immediately following the sequence of numbers are no longer causing false negatives. And the match is no longer biased towards the end of a potential match, either. This, however, handles 001 111111111111 differently from your approach: RegEx101.com.

This could be accounted for with

[1-9][0-9]([- :]*\d){10,17}

at the cost of allowing a few more zeros from "5452 0000 0000 0000000": RegEx101.com.

All suggestions were checked against your sample input, only. Different input might require further tweaking.

Please comment, if and as this requires adjustment / further detail.

Reg Ex is getting more digits than expected

Problems with my RegEx

1 Answers1