1

Lets say I have datetime in the format

12 September, 2016
September 12, 2016
2016 September, 12

I need regex like it should return match in same order always for any dateformat given above

match-1 : 12
match-2 : September
match-3 : 2016

I need results in the same order always.

jaco0646
  • 15,303
  • 7
  • 59
  • 83
REDDY PRASAD
  • 1,309
  • 2
  • 14
  • 29

3 Answers3

2

You can't switch the group order but you can name your groups:

(r'(?P<day>[\d]{2})(?:\s|,|\?|$)|(?P<month>[a-zA-Z]+)|(?P<year>[\d]{4})')
  • (?P<day>[\d]{2})(?:\s|,|\?|$): matches a day, can be accessed in python with l.group("day")

  • (?P<month>[a-zA-Z]+): matches a month, can be accessed in python with l.group("month")

  • (?P<year>[\d]{4}): matches a year, can be accessed in python with l.group("year")

Example:

import re

data = """
12 September, 2016
September 12, 2016
2016 September, 12
September 17, 2012
17 October, 2015
"""

rgx = re.compile(r'(?P<day>[\d]{2})(?:\s|,|\?|$)|(?P<month>[a-zA-Z]+)|(?P<year>[\d]{4})')

day = ""
month = ""
year = ""

for l in rgx.finditer(data):
        if(l.group("day")):
                day = l.group("day")
        elif(l.group("month")):
                month = l.group("month")
        elif(l.group("year")):
                year = l.group("year")

        if(day != "" and month != "" and year != ""):
                print "{0} {1} {2}".format(day, month, year)
                day = ""
                month = ""
                year = ""

Demo

Marcs
  • 3,768
  • 5
  • 33
  • 42
2

Named groups as suggested below is a good way of doing it (especially if you already have the regexes set up) but for completion's sake here's how to handle it with the datetime module.

from datetime import datetime as date

def parse_date(s):
    formats = ["%d %B, %Y",
               "%B %d, %Y",
               "%Y %B, %d"]

    for f in formats:
        try:
            return date.strptime(s, f)
        except ValueError:
            pass

    raise ValueError("Invalid date format!")

arr = ["12 September, 2016",
       "September 12, 2016",
       "2016 September, 12",
       "12/9/2016"]

for s in arr:
    dt = parse_date(s)      
    print(dt.year, dt.strftime("%B"), dt.day)

"""

2016 September 12
2016 September 12
2016 September 12
Traceback (most recent call last):
  File "C:/Python33/datetest.py", line 22, in <module>
    dt = parse_date(s)
  File "C:/Python33/datetest.py", line 19, in parse_date
    raise ValueError("Invalid date format!")
ValueError: Invalid date format!

"""

For more information, see the datetime documentation page.

damjan
  • 76
  • 1
  • 3
  • 1
    You might want to cater for the case where data doesn't meet one of those three formats otherwise you'll risk getting a `NameError` - or potentially and even worse, re-using the previous date where the current one doesn't match... – Jon Clements Sep 17 '16 at 13:03
  • Cool - alternatively - you can wrap it in a function like [here](https://stackoverflow.com/a/23581184) – Jon Clements Sep 17 '16 at 13:21
  • That certainly looks a lot more elegant. I'm new to SO, should I just go ahead and use your idea in my answer or would it be more appropiate to let you submit it as an answer yourself? – damjan Sep 17 '16 at 13:38
  • You're fine as is :) – Jon Clements Sep 17 '16 at 13:46
0

You cannot change group orderings. You need to do a "or" of 3 patterns and then pass through the result to determine which group mapped to what, which should be pretty simple.

Abhishek Patel
  • 774
  • 5
  • 19