How does this regular expression (?:\d[ -]*?){13,16}\b work?

Question

Some people like to put - or space between subgroups of digits when writing their credit card number, hence the above REs would fail to capture them.

Can you please dissect the RE:

(?:\d[ -]*?){13,16}\b

and explain why it can solve the problem?

I know *? will match the previous element zero or more times, but as few times as possible.

This will match `one digit` followed by `any number ( 0 to many) of hyphen or space` which is optional and all this repeated 13 to 16 times. This is not a good regex to match credit card number. — , Apr 10 '16 at 17:57

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

(?:some regex) indicate non capturing group
\d indicate decimal is exepted such as [0-9]
[] indicate to match any character in this.
In fact [ -] indicate to find space or -
* indicate 0 or more
{} is a range of repetition with (xMin, xMax)
{13, 16} => Repeted 13 or more but less than 17
\b indicate the pattern must terminated by presceding token.

For this question \b refer to the non capturing group

You can try the behavior of this regex on http://regexr.com/.

Some valid pattern are:

0332 - 221 - 212 - 111

0-11 -0151- 0151 - 10

0000 - 0000 - 0000 - 0

0000 - 0000 - 0000 - 00

0000 - 0000 - 0000 - 000

0000 - 0000 - 0000 - 0000

00000 - 0000 - 0000

0000-0000-0000-0000

0-0-0-0-0-0-0-0-00-0-0-0

score 0 · Answer 2 · edited May 23 '17 at 12:07

First of all, Capturing Groups are used to groups parts of the RegEx. They are defined by putting () around the data you want in the group. A Non-Capturing Group is defined by adding ?: inside the (), like so, (?:data)

Secondly, a Lazy RegEx is one that tries to capture as little as possible, rather than as much as possible. Check out this StackOverflow question

How the RegEx works:

(?:          # Non-Capturing Group
  \d           # Digit
  [ -]*?       # Space or - (Hyphen), 0 or more times (Lazy)
)
{13,16}      # Repeats the Non-Capturing Group 13 to 16 times
\b           # Word Boundary

The Non-Capturing Group will match any string containing 1 Digit, combined with any number of Space (spaces) and - (hyphens). So 9, -9-, - 9 -, 9 and - - - 9 - - - are all matches.

Then the group can be repeated 13 to 16 times, so the examples above can be repeated 13 to 16 times, to match each Credit Card Digit

So these are all full valid matches:

9999 - 9999 - 9999 - 9
9999 - 9999 - 9999 - 99
9999 - 9999 - 9999 - 999
9999 - 9999 - 9999 - 9999
999 - 999 - 999 - 999 - 999
99-99-99-99-99-99-99
9-9-9-9-9-9-9-9-9-9-9-9-9

Live Demo on Regex101

score 0 · Answer 3 · edited May 23 '17 at 11:45

\d matches a digit (Does "\d" in regex mean a digit?)

\b marks the end of a word/number

[0-9]{13-16} indicates that the repetition of the characters in the brackets before, either 13,14,15 or 16 times. The background is that old credit cards numbers have 13 digits newer credit card numbers have 16 digits.

So \b\d{13,16}\b will find/match any sequence of 13 to 16 digits, meaning that it can be used to find credit card numbers without any '-'

?: is another special case meaning 'clustering without capturing` (use of colon symbol in regular expression)

(?:pattern) is used to match the pattern, but not capture it resulting in removing the questioned characters from the result string, i.e. the '-' (What is a non-capturing group? What does a question mark followed by a colon (?:) mean?)

* means zero or more repetitions of the character(s) in the element before

? means the characters in the element before can appear, but don't have to

*? (used in the regex above) is the non-greedy / lazy version i.e. [^a]*? means "a sequence of 0 or more characters, not containing 'a', as short as possible while conforming to the rest of the regular expression."

So the regex matches any sequence of numbers of length 13 to 16 containing an arbitrarily number of '-' and ' ' (whitespaces)

Note that the non-capturing group ?: matches but does not capture the substring in the result: The regex \b(a)(?:b)(c)\b applied on the string "abc" matches but in the result the non-capturing group is skipped ( a group is everything in ( )):

match: "abc", match1: "a", match2: "c" ("b" does not appear in the match list)

However the entire match can be retrieved by calling group() on the match object, see Alan Moore's comment below.

You can test this on https://regex101.com/

See this Regex credit card number tests

Non-capturing groups don't *remove* anything. The entire match is still saved, and can be retrieved by calling `group()` on the `match` object. Capturing groups are used to extract or refer back to parts of the matched string independently from the whole match, while non-capturing groups just group things. The OP's regex uses the non-capturing group correctly because there's no point capturing the digits individually. — Alan Moore, Apr 10 '16 at 22:06
Edited the answer however i think it was not faulty. The non-capturing group matches but excludes the group from the match list (does not capture it), see http://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group — ralf htp, Apr 11 '16 at 04:56
I get you, but you said the group was excluded from the **result string**, which is confusing. It sounded like were talking about lookarounds, which *are* excluded from the overall match. — Alan Moore, Apr 11 '16 at 13:33

How does this regular expression (?:\d[ -]*?){13,16}\b work?

3 Answers3