-1

I'm somewhat new to python, and for this assignment we were asked to used a single regular expression to solve each prompt. I've finished prompts A-C, but now I'm stuck on prompt D. Here's the prompt:

d. A substitution, using a regular expression, that converts a date in either the format “May 29, 2019” or “May 29 2019” to “29 May 19”.

A valid date format to match has these elements: •The month must be the common three letter month abbreviation beginning with a capital letter followed by two lower case letters: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
•The day may be one or two digits. It is not necessary to check for a valid day and dates with leading zeros are acceptable such as 03. •The year is exactly four digits.
•The month and day are separated by one or more spaces. The day and year are also separated by one or more spaces but an optional comma immediately after the day is permitted (no spaces between the day and comma are permitted)

What I'm stuck on: I'm not sure what to put in the r"..." statement (refer to code), with what I have now I get an error "re.error: bad escape \w at position 0", if we could fix the error or find an another way to do it while maintaining the substr = r"..." I would really appreciate it! Thank you!

Note: --my re.compile code works just fine, before I messed with the substring to change the output, it accepted the case. It just didn't convert it as I had not written the conversion string yet. --At the moment how im processing dates isn't very conventional, i plan on working on that after getting something that works.

Code:

import re

d = re.compile(r"^((Jan)\s+[1-31],\s+\d{4})$|"
               r"^((Jan)\s+[1-31]\s+\d{4})$|"
               r"^((Feb)\s+[1-28],\s+\d{4})$|"
               r"^((Feb)\s+[1-28]\s+\d{4})$|"
               r"^((Feb)\s+[1-29],\s+\d{4})$|" #ask prof about leap years
               r"^((Feb)\s+[1-29]\s+\d{4})$|"  #ask prof about leap years
               r"^((Mar)\s+[1-31],\s+\d{4})$|"
               r"^((Mar)\s+[1-31]\s+\d{4})$|"
               r"^((Apr)\s+[1-30],\s+\d{4})$|"
               r"^((Apr)\s+[1-30]\s+\d{4})$|"
               r"^((May)\s+[1-31],\s+\d{4})$|"
               r"^((May)\s+[1-31]\s+\d{4})$|"
               r"^((Jun)\s+[1-30],\s+\d{4})$|"
               r"^((Jun)\s+[1-30]\s+\d{4})$|"
               r"^((Jul)\s+[1-31],\s+\d{4})$|"
               r"^((Jul)\s+[1-31]\s+\d{4})$|"
               r"^((Aug)\s+[1-31],\s+\d{4})$|"
               r"^((Aug)\s+[1-31]\s+\d{4})$|"
               r"^((Sep)\s+[1-30],\s+\d{4})$|"
               r"^((Sep)\s+[1-30]\s+\d{4})$|"
               r"^((Oct)\s+[1-31],\s+\d{4})$|"
               r"^((Oct)\s+[1-31]\s+\d{4})$|"
               r"^((Nov)\s+[1-30],\s+\d{4})$|"
               r"^((Nov)\s+[1-30]\s+\d{4})$|"
               r"^((Dec)\s+[1-31],\s+\d{4})$|"
               r"^((Dec)\s+[1-31]\s+\d{4})$")

subStr = r"\w\s\d{1,2}\s\d{4}"

print("----Part d tests that match (and should change):")
print(d.sub(subStr, "May 29, 2019"))

print("----Part d tests that match (and should remain unchanged):")
print(d.sub(subStr, "May 29 19"))

Expected output:

----Part d tests that match (and should change):
May 29 19
----Part d tests that match (and should remain unchanged):
May 29 19

Actual output(if i left the substring blank, and how it currently is):

Blank:
----Part d tests that match (and should change):
May 29, 2019
----Part d tests that match (and should remain unchanged):
May 29 19

--------------------------------
Current:
----Part d tests that match (and should change):
    this = chr(ESCAPES[this][1])
KeyError: '\\w'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Xavier/PycharmProjects/hw7/hw7.py", line 101, in <module>
    print(d.sub(subStr, "May 29, 2019"))
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python37\lib\re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python37\lib\re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \w at position 0

  • 1
    These parts `[1-31]` do not work like that, it is a [character class](https://www.regular-expressions.info/charclass.html) matching `123` as in a range from 1 to 3 and another 1 which is already covered by the 1 to 3 part. You might look at [this page](https://stackoverflow.com/questions/15491894/regex-to-validate-date-format-dd-mm-yyyy) to match a date like format. – The fourth bird May 31 '19 at 07:05

2 Answers2

1

Hints:

  • (Jan|Feb|Mar) matches and captures the month...extend that for all months.
  • Square brackets match a single character...[1-31] is effectively [123]...the range 1-3 or 1 (redundant). [0-9] or just \d matches any single digit. The requirements said the date does not need to be validated, so \d{1,2} (match 1 or two digits) should be legal.
  • ? is used for 0 or 1 match so ,? is an optional comma.
  • 4-digit year, but only capture last two: \d{2}(\d{2}).
  • You should have three capture groups in the match string. \n where n is the group number inserts what was captured, so the replacement is just r'\2 \1 \3'.
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
0

If using regex is not mandatory, I would instead use pandas.to_datetime or time.strptime:

pandas

import pandas as pd

s = "Jun 29, 2019"

try:
   print(pd.to_datetime(s).strftime('%d %b %Y'))

except ValueError:
   print('unrecognized time format!')

%b means abbreviation of the month, see the docs for a complete list.

time

or if you don't have pandas installed, use the built-in library time:

import time

out = None
for pattern in ['%b %d, %Y', '%b %d %Y']:
    try:
        out = time.strftime('%d %b %Y', time.strptime(s, pattern))
    except ValueError:
        continue

if out is None:
    print('Error: Could not read the time')

regex

If you do have to use regex for this, you need to replace [1-31] with e.g. (?:[12]\d|3[01]|\d), see regex tester.

and then you're using re.sub wrongly. You would want to insert capturing groups in the first big regex and then use \1, \2, ... in the replacement string to put them back in, so no \w

Edit

And the only way I can think of using regex and checking for dates would be

(?:(Jan|Mar|May|Jul|Aug|Oct|Dec) (3[01]|[12]\d|\d)|(Apr|Jun|Sep|Nov) (30|[12]\d|\d)|(Feb) (2[0-9]|[1]\d|\d)),? (\d{4})

and using

subStr = '\1\3\5 \2\4\6 \7'

which is incredibly ugly, and does not treat leap years.

Snow bunting
  • 1,120
  • 8
  • 28