6

I'm trying to create a python regex for "one or two digits numbers sequences separated by optional multiple spaces or an optional single comma."

For example:

"   1"  Should tests good
"    1  2     3 3  4 5 7 17" Should test good
" 1, 2,3,11,74" Should test good
"1,11,14, 15" Should test good

"111, 101" Should not test good
"1 2 3  a" Should not test good
"1, 25, 5.0 " Should not test good
"1,, 7, 80" Should not test good
"1,11,14," Should not test good

Comma signs should only appear between numbers (or white spaces). That's why last example shouldn't test good.

I tried with this:

^\s*\d{1,2}(\s*\,?\d{1,2}\s*\,?)*\s*$

But got not good results, for example "11111" would test good. How should I write my regex?

diegoaguilar
  • 8,179
  • 14
  • 80
  • 129

4 Answers4

5

This regex should work ^(\s*\d{1,2}\s*$)|^((\s*\d{1,2}\s*[\,\s]\s*\d{1,2}\s*))+([\,\s]\s*\d{1,2}\b\s*)*$. Note that to capture between 1 and two times you use {1,2}, where the number before the comma is the lower bounds, while the number after the comma is the upper bounds. The way it works is we either capture ^(\s*\d{1,2}\s*$) or ^((\s*\d{1,2}\s*[\,\s]\s*\d{1,2}\s*))+([\,\s]\s*\d{1,2}\b\s*)*$. For the first option, we first look for beginning of String ^. Next, we look for an optional infinite amount of space \s* followed by a number of one or two digits (\d{1,2}), followed by an optional infinite amount of space, then the end of String $. For the second option, we allow optional infinite space \s* followed by one or two digit number \d{1,2}, followed by optional infinite amount of space \s*. Next we allow either a comma or a space [\,\s]. Then we allow optional infinite spaces again \s*, followed by one or two digits \d{1,2}, followed by optional infinite space \s*. This must occur at least once + to be considered a match (just whitespace alone or anything starting with a comma will not match). It can be followed by a comma or space [\,\s], followed by an infinite amount of space \s*, followed by a one or two digit number \d{1,2}. This is followed by a boundary \b and an optional infinite amount of space s*. This group can occur an optional infinite amount of times, hence * and is followed by $, the end of String.

Moishe Lipsker
  • 2,974
  • 2
  • 21
  • 29
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73724/discussion-between-diegoaguilar-and-moishe-lipsker). – diegoaguilar Mar 25 '15 at 06:20
  • @diegoaguilar I'm pretty sure there is a way. I'm just not sure how to do it yet though. I'm still learning regex also. – Moishe Lipsker Mar 25 '15 at 06:21
  • @diegoaguilar Try this regex for requiring each number to have a ! followed by at least one space before it ^(\s*\!\s+\d{1,2}\s*$)|^((\s*\!\s+\d{1,2}\s*[\,\s]\s*\!\s+\d{1,2}\s*))+([\,\s]\s*\!\s+\d{1,2}\b\s*)*$ – Moishe Lipsker Mar 25 '15 at 09:31
4

Using the regex module of python, you can have this (rather convoluted!) regex:

(?:^\s*|\G)\s*(?:,\s*)?\K(\b\d{1,2}\b)(?=(?:\s*(?:,\s*)?\b\d{1,2}\b)*$)

regex101 demo

(?:^\s*|\G)                    # Matches beginning of line and any spaces, or at the end of the previous match
\s*(?:,\s*)?                   # Spaces and optional comma
\K                             # Resets the match
(\b\d{1,2}\b)                  # Match and capture 1-2 digits
(?=                            # Makes sure there is (ahead) ...
  (?:
     \s*(?:,\s*)?\b\d{1,2}\b   # A sequence of spaces (with optional comma) and 1-2 digits...
  )*                           # ... any number of times until...
$)                             # ... the end of the line

This one should be faster:

(?:^(?=(?:\s*(?:,\s*)?\b\d{1,2}\b)*$)|\G)\s*(?:,\s*)?\K(\b\d{1,2}\b)
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • What's the difference against re module? And what are its caveats if any? – diegoaguilar Mar 25 '15 at 06:41
  • @diegoaguilar The difference is that this module has more features, those that are currently present in PCRE (perl compatible regular expressions). Especially here, I'm using `\G` and `\K` which are not available in the `re` module. And I'm not sure what you mean by counterparts. – Jerry Mar 25 '15 at 06:43
  • Sorry, I meant caveats :P – diegoaguilar Mar 25 '15 at 06:44
  • 1
    @diegoaguilar Oh there haha. Well, the first one is that it's not the default module, so you have to install it (I have yet to install it myself, but then, I prefer using multiple regex if necessary instead of a long one like that even if I know how to, yes I'm terrible at installing stuff...). Second... I cannot think of any. With the specific regex above, I would suspect it to be slightly slower than the previous answers due to the lookahead (making the regex engine go back and forth quite a few times). – Jerry Mar 25 '15 at 06:48
  • Which one you refer by "specific regex above"? – diegoaguilar Mar 25 '15 at 06:49
  • I guess `\G` or `\K` one of those probably `\G` is not there in `regex module` – vks Mar 25 '15 at 06:50
  • @diegoaguilar The one in my answer. The `(?= ...)` part is called a positive lookahead. It 'checks' the string while not consuming it within a match. Useful, but at a cost. – Jerry Mar 25 '15 at 07:00
  • 1
    I'm duly impressed with the capabilities of PCRE. However, OP states _"Comma signs should only appear between numbers (or white spaces)"_ and though they did not give this example, I deduct that `,5,2` should not match because the first comma has no number before it. This regex does match that. – asontu Mar 25 '15 at 08:42
  • @funkwurm Indeed, I did not tale that into consideration. I guess a negative lookahead could be used to prevent that at the beginning near the first assertion `(?!\s*,)`. – Jerry Mar 25 '15 at 08:54
1

You can modify your regex

^\s*\b\d{1,2}\b(?:\s*\,?\s*\b\d{1,2}\b)*\s*$

See demo.

https://regex101.com/r/sJ9gM7/5#python

vks
  • 67,027
  • 10
  • 91
  • 124
  • Why does it include a word boundary? – diegoaguilar Mar 25 '15 at 05:24
  • Great, thanks .. Can you extend and explain the syntax, about how to use capturing multiple groups and how you ensured there could not be a comma at last with out any following number? Also on how you used flags in order to make a good regex? – diegoaguilar Mar 25 '15 at 05:48
  • @diegoaguilar just modified your regex to include `\b` so that continous numbers are not captured.Nothing else haschanged – vks Mar 25 '15 at 05:49
  • I appreciate extended answers, I'm quite new to regular expressions so I need to understand all deeply. Same, if I could get each number captured would be a plus – diegoaguilar Mar 25 '15 at 06:08
  • 1
    @diegoaguilar you can't do both `match` and `capture` in same regex as pythn only remembers the last group captured.What you can do is once you use `re.match` then on the result do `re.findall(r"\b\d+\b")` to get each number – vks Mar 25 '15 at 06:11
1

This one also ensures you only have a single comma

^\s*\d{1,2}(\s*[,\s]\s*\d{1,2})*\s*$

Here's a demo:

https://regex101.com/r/jW7qL5/1

Additional requested info

The demo gives an explanation of the syntax (right panel).

The expression [,\s]\s*\d{1,2} ensures that the comma always appears before on or more digits (with optional space between).

I use the flags gm (global and multiline) to match on several lines of text, but this depends on how you want to use it.

Use the following regex to capture the numbers

^\s*(\d{1,2})(?:\s*[,\s]\s*(\d{1,2}))*\s*$

The (?: syntax is used to prevent that group bein captured

sparkplug
  • 1,317
  • 2
  • 15
  • 32
  • Great, thanks .. Can you extend and explain the syntax, about how to use capturing multiple groups and how you ensured there could not be a comma at last with out any following number? Also on how you used flags in order to make a good regex? – diegoaguilar Mar 25 '15 at 05:41
  • What do you want to capture in groups? – sparkplug Mar 25 '15 at 05:53
  • Is there any way to get in capture groups the whole valid numbers? – diegoaguilar Mar 25 '15 at 05:55
  • Yes - try this: ^\s*(?\d{1,2})(\s*[,\s]\s*(?\d{1,2}))*\s*$ and check the named capture group "num" – sparkplug Mar 25 '15 at 05:58
  • The problem is that for "11 11 44 77, 7" only first and last match are being captured – diegoaguilar Mar 25 '15 at 06:10
  • I think that's just a issue with Python regex captures. Not sure how you can get round that in one regex. See this post: http://stackoverflow.com/questions/8651347/regex-to-match-a-capturing-group-one-or-more-times – sparkplug Mar 25 '15 at 06:15