0

I'm attempting to read data from multiple text files and move the data into a two-dimensional array. The data needs to remain in a specific order.


Could regex assist with this?


If you have any insight on how to improve this section of the code please let me know.

3 Answers3

2

the datetime module provides (most) everything date-related

from datetime import datetime

date = "Sat 30-Mar-1996 7:40 PM"
fmt = "%a %d-%b-%Y %I:%M %p"
a = datetime.strptime(date, fmt)
print(a.year)
>>> 1996
diggusbickus
  • 1,537
  • 3
  • 7
  • 15
0

You can parse the date-time string very easily by splitting its components and using iterable unpacking, e.g.,

def parse_date(d):
    day_of_week, date, hhmm, ampm =  d.split()
    day_of_month, month, year = date.split('-')
    hour, minute = hhmm.split(':')
    return (year, month, day_of_month,
            ​hour if ampm=='AM' or str(int(hour)+12), minute,
            day_of_week)

and later, in the body of the loop

year, m, dom, ​h, m, dow  = parse_date(fields[-1].strip())

or, if you are interested only in year

year, *_ = parse_date(fields[-1].strip())
gboffi
  • 22,939
  • 8
  • 54
  • 85
-1

You're probably looking for regular expressions, which are a very powerful way to analyze and extract data from strings. For an intro into them, I'd check out this site or the python docs, but in your case I think you probably want something like '| ([a-zA-Z]*) ([0-9]*)-([a-zA-Z]*)-([0-9]*) ([0-9:]* [a-zA-Z]*) |' would work. A more specific description of the format the time would be in is necessary for a 100% correct regex [short for regular expressions].

To use regex in python, you want the re library. First, create the pattern matcher with matcher = re.compile(your_regex_string_here). Then, find the match with result = matcher.match(file_contents). (You could also just do result = re.match(regex_string,file_contents).) Whatever your regex, anything surrounded by parentheses is known as a "capturing group", which can be extracted from the result with result.group(); result.group(0) will return full match, and result.group(n) will return the contents of the nth capturing group - that is, the nth set of parentheses. In the above example, result.group(4) would return the year, though you could get any of the day of the week, day, month, year, and time by using groups 1-5.

The DateTime module as mentioned in another answer is also a great tool.

minerharry
  • 101
  • 8
  • No, you don't need regex for this. You need to parse the date/time, which is already in a list. – MattDMo Aug 21 '21 at 21:07
  • And that's a terrible regex to use, it'll capture everything. Don't try to rewrite the `datetime` module. Use the right tool for the job, and regex is NOT the right tool for this job. – MattDMo Aug 21 '21 at 21:09
  • Just look at [this](https://regex101.com/r/gmMuvf/1) on regex101.com. Did you even think about testing your regex before posting it? And I was a little off - it doesn't match *every*thing, it only matches between every single character, so all the results will be `None`. Nice job. – MattDMo Aug 21 '21 at 21:13