2

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:

Mon - Fri:,10:00 - 19:00

Any help would be appreciated in this regard. I have tried the following:

import re

txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)

But this regex and any other that I tried to use didn't do the task.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
rex sphinx
  • 31
  • 6

4 Answers4

3

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.

After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).

import re
from datetime import datetime

def to_military_time(x):
    return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")

txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00
ggorlen
  • 44,755
  • 7
  • 76
  • 106
1

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.

data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

However, you might want to consider all punctuation as valid:

data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
tmrlvi
  • 2,235
  • 17
  • 35
1

Regex need to change like here.

import re

text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm

You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.

You can learn more regex here.

https://regexr.com/

And here you can try regex online.

https://regex101.com/

Chih Sean Hsu
  • 423
  • 2
  • 10
1

Why not use the time module?

import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)

Output:

Mon - Fri:,10:00 - 19:00
Ionut Ticus
  • 2,683
  • 2
  • 17
  • 25