1

I need to verify if a string is a correctly formatted range of numbers in Python2.7, i mean the range that we use for example when choosing some specific pages to print, like this:

1,3,5-7, 10, 15 - 20

I would like it to accept white-spaces around the dashes and the comma, since often people tend to use white-spaces in such ranges:

I tried regex with not much luck. There is re.fullmatch in Python 3 that apparently only matches if the whole string matches the pattern, it does not exist in Python 2.7, however i tried this way of doing it in Python 2 withch apparently works properly but my regex seems to be wrong. I tried many different regexs and all of them failed in one or another way, the last one allowed wrong characters in the beginning of the line (this is for commas only, didn't get to dashes yet):

^\d+$|(\d+)(\s?)(,{1})(\s?)(\d+)

I am not bound to use the regex for this, however it would be nice to know how this can be fixed with regex.

Alex D.
  • 73
  • 9

5 Answers5

3

If you would use a regex, perhaps this would match your requirements:

^\d+(?: *- *\d+)?(?:, *\d+(?: *- *\d+)?)*$

Explanation

  • ^ Assert start of the line
  • \d+ Match one or more digits
  • (?: Non capturing group
    • *- *\d+ Match zero or more whitespaces followed by a hyphen, one or more whitespaces and one or more digits
  • )? Close non capturing group and make it optional
  • (?: Non capturing group
    • , *\d+ Match a comma, zero or more whitespaces and one or more digits
    • (?: Non capturing group
      • *- *\d+ Match zero or more whitespaces followed by hyphen, zero or more whitespace characters and one or more digits
    • )? Close non capturing group and make it optional
  • )* Close non capturing group and repeat it zero or more times
  • $ Assert the end of the line

Demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thanks a lot, this works exactly as i meant. The only thing that it doesn't match is a comma between 2 spaces: `4 , 6 , 7` But i think i will be able to figure this out. Thanks for the great explanation as well. – Alex D. Jun 27 '18 at 15:24
  • @AlexD. You are welcome. You could match zero or more whitespaces by adding a whitespace and an asterix before the comma [`^\d+(?: *- *\d+)?(?: *, *\d+(?: *- *\d+)?)*$`](https://regex101.com/r/RFdBE0/1) – The fourth bird Jun 27 '18 at 16:00
1

I wouldn't bother with regular expressions.

def verify(s):
    last = 0
    for r in s.replace(' ', '').split(','):
        # A single integer...
        try:
            last = int(r)
            continue
        except ValueError:
            pass

        # ... or a hyphen-separated pair of increasing integers
        try:
            x, y = r.split("-")
            if not (last < x <= y):
                return False
            last = y
        except ValueError:
            return False

    return True

This also ensures that values occur in sorted order, which a regular expression is incapable of doing.

chepner
  • 497,756
  • 71
  • 530
  • 681
1

Since you need to parse it in the end anyway, it is more straightforward to use parsing as validation. Many Python libraries use this approach as well, for example JSON. It avoids duplicated logic (1. validation, 2. parsing), allows for more expressive error messages and is often much easier.

To parse a single literal, such as 4 or 1 - 3, split it and convert the start/stop values. This automatically raises a ValueError if numbers are not valid integers.

def page_range(literal):
    """
    Convert a single page literal into a sequence of pages

    :raises ValueError: if literal does not denote a valid page range
    """
    start, sep, stop = literal.partition('-') # '1 -3' => '1 ', '-', '3'; ' 4' => ' 4', '', ''
    if not start:  # may want to raise an error for empty page literals
        return []
    if not sep:  # no '-' in literal, just the start
        return [int(start)]
    # sep is present, literal is a range of pages
    return list(range(int(start), int(stop) + 1))

You can use this to aggregate the pages of multiple literals, such as 4, 1-3. By using exceptions, you can raise an error for the portion of the literal that is invalid:

def pages(literals):
    for literal in literals.split(','):
        try:
            yield page_range(literal.strip())
        except ValueError:  # parsing failed, raise manually to add details
            raise ValueError('Invalid page range: %r' % literal)
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
1

I would just split on comma, then optionaly on hyphen, and just control that each hyphen separated range has no more than to integers. Following function returns a generator that returns a list of 1 or 2 integers per range after validating syntax:

def parse(string):
    ranges = re.split(r'\s*,\s*', string.strip())  # split on comma ignoring spaces
    for i in ranges:
        limits = re.split(r'\s*-\s*', i)   # optionaly splits on hyphen
        if len(limits) > 2:                # one max
            raise ValueError('More than one hyphen in a range ({})'
                     .format(i))
        if any([ re.match('[0-9]+', num) == None for num in limits ]): # only integers allowed
            raise ValueError('A page number is not an integer ({})'
                     .format(num))
        yield [ int(num) for num in limits ]

This will happily ignore any space or tab, still ensuring correct syntax

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

You can use the search string '^(?:\s+|\d+\s*-\s*\d+|,|\d+)*$' which will look for either white spaces, a number range (with any amount of whitespace ex: 1 - 3 or 1-3 both match), commas, or numbers.

The issue is that it doesn't require a comma to separate each occurrence so 1 2 3 4 is valid, and it doesn't care if commas are repeated so ,, , , is also valid. If you care about these things then this will not work well for you.

Jacob Boertjes
  • 963
  • 5
  • 20