126

It's a simple question about regular expressions, but I'm not finding the answer.

I want to determine whether a number appears in sequence exactly two or four times. What syntax can I use?

\d{what goes here?}

I tried \d{2,4}, but this expression accepts three digits as well.

Dmitry Ginzburg
  • 7,391
  • 2
  • 37
  • 48
Renato Dinhani
  • 35,057
  • 55
  • 139
  • 199
  • 1
    For example, to match a two- or four-digit **year**. – DavidRR Oct 18 '12 at 18:23
  • What do you want to happen the if string is `abc 123 xyz`? Should it match `12` because that is exactly two digits in sequence? Or should it not, because `12` is part of a larger digit sequence `123` which itself is neither 2 nor 4 long? If I had to guess, I'd think you want the latter behaviour, but it isn't clear from your question. Examples and/or a clearer specification would help. Same question for `abc 12345 def`... what should happen there? – Jean-François Corbett Apr 30 '20 at 11:20

2 Answers2

192

There's no specific syntax for that, but there are lots of ways to do it:

(?:\d{4}|\d{2})    <-- alternation: four digits if possible, else just two
\d{2}(?:\d{2})?    <-- two digits, plus two more if possible
(?:\d{2}){1,2}     <-- two digits, times one or two

So, for example, to match strings consisting of one or more letters A–Z followed by either two or four digits, you might write ^[A-Z]+(?:\d{4}|\d{2})$; and to match a comma-separated list of two-or-four-digit numbers, you might write ^((?:\d{4},|\d{2},)*(?:\d{4}|\d{2})$ or ^(?:\d{2}(?:\d{2})?,)*\d{2}(?:\d{2})$.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • 2
    Personally, only thought of the `\d{2}(?:\d{2})?` solution right off the bat - nice variety of these - the last one, in particular, seeming very nice and scalable. – Nightfirecat Nov 18 '11 at 02:48
  • 3
    +1 for being mindful of the order needed when using alternation to match 4 digits first, then 2 digits. Also good job providing the other variations. – Ahmad Mageed Nov 18 '11 at 02:57
  • 10
    For anyone who, like me, didn't understand the use of `(?:` this starts a "non-capturing group" (a group that is not intended to be referenced in a replace statement). You could also just use parens but these will create a capturing group. Further details here: http://stackoverflow.com/questions/3512471/non-capturing-group – Jeremy Moritz Oct 15 '14 at 20:44
  • These will show the same result for either "333" and "33" – Dan Mar 01 '17 at 06:21
  • 1
    @Dan: These regexes do *not* match the complete string `"333"`. You may be using your regex library's "find matching substring" functionality by mistake, rather than its "check if complete string matches" functionality. You should consult its documentation. – ruakh Mar 01 '17 at 06:44
  • @ruakh I'm actually using latest chrome javascript, I don't think there is an option to change the behavior like that there, I used other javascript solution, thanks – Dan Mar 01 '17 at 21:49
  • @Dan: In JavaScript I'd suggest just using `^` and `$` to "anchor" the regex match: for example, `/^(?:\d{4}|\d{2})$/.test(s)` will be true if `s` is a string consisting of exactly 2 or 4 digits. – ruakh Mar 01 '17 at 22:20
  • ^\d\d$|^20\d\d$ just need to test if a field is either a 2 digit year or a 4 digit year. will reject 1,123,12345,2199 accept 18 or 2018 – zzapper Apr 20 '18 at 11:54
  • I feel sure 4 or 2 digits would grab 3 digits. So this is wrong in my eyes. – JGFMK Apr 30 '20 at 07:32
  • @JGFMK: Your link shows that, in a sequence of three digits, `(?:\d{4}|\d{2})` only matches two. That's exactly what the OP is looking for. (Note that the OP is not asking for a complete regular expression, but for a syntax to use *in* a regular expression. So, for example, the overall requirement might be "match `foo` plus two-or-four-digits plus `bar`", in which case the complete regex might be `^foo(?:\d{4}|\d{2})bar$`.) – ruakh Apr 30 '20 at 15:45
  • When you do regex matches - you invariably want the match groups back to do something with. If you have 3 digits and want 2 or 4, which of the 3 chars do you take, first two, or last two. That sort of lack of usefulness of the regex is why I felt this was wrong. And the point "but this expression accepts three digits as well".. which infers an undesired behaviour your solution did nothing to address. – JGFMK Apr 30 '20 at 16:43
  • @JGFMK: Again -- this answer does not provide complete regexes, but just the *part* of the regex that the OP needs. The OP hasn't posted his/her full regex, just the problematic part, so I couldn't post a full corrected version, just a corrected version of the problematic part. (But if it makes you feel better, there are regex engines -- such as the one built into Java -- that provide functionality to match the *whole* string against a given regex, rather than implicitly matching a *substring*. In such a regex engine, the versions in this answer can be used as the complete regex.) – ruakh Apr 30 '20 at 17:04
  • Ironically I was using Java when I uncovered the issue. And there are nuances as you say between using String,matches and Matcher.find(). I am familiar with variances in other languages such as Python/JavaScript/TypeScript too. – JGFMK Apr 30 '20 at 21:10
21
(?<!\d)(\d{2}|\d{4})(?!\d)

This is the correct way to do it. The accepted answer is wrong.

It would match 3 digits (or 5). So that is wrong in my eyes.

  1. Check there is no digit before a sequence of 2, or 4 digits, or after a sequence of two or four digits.
  • (?<!) syntax is negative lookbehind

  • (?!) syntax is negative lookahead.

The above would work for mid string:

If your search string has no content around it you could use the ^ and $ start and end of string anchors:

^\d{4}$|^\d{2}$
JGFMK
  • 8,425
  • 4
  • 58
  • 92
  • 4
    I wouldn't say that the [accepted answer](https://stackoverflow.com/a/8177150/119775) is wrong. I would say the *question* is unclear, and that that answer addresses one valid interpretation of it. Your answer addresses another valid interpretation (which I happen to think is a more likely one -- but apparently the asker didn't...). – Jean-François Corbett Apr 30 '20 at 11:24
  • 7
    "It would match 3 digits" is not quite accurate. I think you mean "It would match a 2-digit subsequence of a 3-digit sequence." – Jean-François Corbett Apr 30 '20 at 11:27
  • 2
    Also, your answer [doesn't quite work as intended on sequences of 5 or more digits](https://regex101.com/r/GxN7wm/2). I'm no regex expert, but I guess one way to [fix it](https://regex101.com/r/hVbQaA/1) is to make the negative lookahead/behind apply to both cases (2- and 4-digit sequences): `(?<!\d)(\d{2}|\d{4})(?!\d)` – Jean-François Corbett Apr 30 '20 at 11:39
  • I think you are correct about the 5 digits. Thanks for that correction. Will fix that. – JGFMK Apr 30 '20 at 14:39
  • `^\d{4}$|^\d{2}$` would be a potential way to fix that. As would `^\d{2}(?!\d)|^\d{4}(?!\d)` – JGFMK Apr 30 '20 at 14:42
  • 1
    @Jean-FrançoisCorbett - The person asking specifically said... "but this expression accepts three digits as well". So I hold true the answer was wrong. It doesn't fix that. – JGFMK Apr 30 '20 at 14:56
  • The OP said that, but what does "accept" mean? Does it mean "match"? If it does, then by that standard, [this answer](https://stackoverflow.com/a/8177150/119775) is ok, because it doesn't match (entire) 3-digit sequences, whereas the OP's regex does. But does "accept" mean "match"? I don't know. The question is ambiguous. Again, I think your interpretation makes more sense, but with the question as it is, there's no way to know for sure. – Jean-François Corbett Apr 30 '20 at 17:51