1

I have a long string containing dates and would like to update the format of all the dates.

The following is what I have written along with pseudocode of the bit I cannot figure out:

import datetime

current_date_format = "%d/%m/%Y"
new_date_format = "%d/%b/%Y"

def main():
    line = "This is text dated 01/02/2017, and there are a few more dates such as 03/07/2017 and 09/06/2000"
    print(line)
    # Best way to pull out and replace all of the dates?
    # pseudo:
    for each current_date_format in line as date_in_line
        temp_date = fix_date(date_in_line)
        line.replace(date_in_line, temp_date)
    print(line)

def fix_date(date_string=''):
    return datetime.datetime.strptime(date_string, current_date_format).strftime(new_date_format)

In this case if should print:

This is text dated 01/02/2017, and there are a few more dates such as 03/07/2017 and 09/06/2000
This is text dated 01/FEB/2017, and there are a few more dates such as 03/JUL/2017 and 09/JUN/2000

Thanks

Jake
  • 1,207
  • 2
  • 28
  • 46
  • You could try using a regex to match the `dd/mm/YYYY` and a dictionary to map the numerical `mm` values to the respective string representation. However, there's probably something in the `datetime` library you have imported or maybe look at `pandas`. I've done some datetime manipulation with that and the scientific libraries always have a lot of support if you find something related. Edit: see https://stackoverflow.com/questions/3276180/extracting-date-from-a-string-in-python A post by @unutbu on that thread mentions something that may be useful to you – Darrel Holt Jul 17 '17 at 22:29
  • Thanks @DarrelHolt I'll take a look at `pandas` I was tossing up between `datetime` and `dateutil`. But liked the way `datetime` let you build your own format. – Jake Jul 17 '17 at 22:31
  • I did see that question, but unfortunately the `dateutil.parser` can only handle strings with one date. My strings will have 0-n dates. – Jake Jul 17 '17 at 22:53

1 Answers1

3

First advice was not a complete solution, skip to first edit section below

If you want to adjust your code in a few ways you can do this. First just break out the string into pieces:

line = "This is text dated 01/02/2017, and there are a few more dates such as 03/07/2017 and 09/06/2000"
words = line.split()  # by default it splits on whitespace

Now you are able to play with each piece of your input. You can then try to parse your date using your fix_date method and re-build the string:

updated_line = ''
for word in words:
    try:
        updated_line += fix_date(word) + ' '
    except:
        updated_line += word + ' '
updated_line = updated_line[:-1] # gets rid of the extra trailing space
print(updated_line)

EDIT: upon running I realize this has a problem with punctuation attached to dates. I am making another pass.

Here is some working code:

import datetime
import re

current_date_format = "%d/%m/%Y"
new_date_format = "%d/%b/%Y"

def main():
    line = "This is text dated 01/02/2017, and there are a few more dates such as 03/07/2017 and 09/06/2000"
    print(line)
    line = re.sub(r'\d{2}/\d{2}/\d{4}',fix_date,line)
    print(line)

def fix_date(rem):
    date_string = rem.group()
    return datetime.datetime.strptime(date_string, current_date_format).strftime(new_date_format)

main()

EDIT 2: As the regex method works on gigantic strings as much as small ones, if your file size is small enough to load all at once you can just do it in one shot:

import datetime
import re

current_date_format = "%d/%m/%Y"
new_date_format = "%d/%b/%Y"

def main():
    with open('my_file.txt','r') as f:
        text = f.read()
    with open('my_fixed_file.txt','w') as f:
        f.write(re.sub(r'\d{2}/\d{2}/\d{4}',fix_date,text))

def fix_date(rem):
    date_string = rem.group()
    return datetime.datetime.strptime(date_string, current_date_format).strftime(new_date_format)

main()

Or even more compact by adjusting the file read/write portion:

...
with open('my_file.txt','r') as f:
    with open('my_fixed_file.txt','w') as f2:
        f2.write(re.sub(r'\d{2}/\d{2}/\d{4}',fix_date,f.read()))
...
Farmer Joe
  • 6,020
  • 1
  • 30
  • 40
  • Would this method scale well? I am just curious because it actually populates `line` as lines from `txt` files that can be a few thousand lines long. – Jake Jul 17 '17 at 22:45
  • @Jake I made an edit with some working code. This one makes use of the regex library called 're' and the fact that you can route matches found through the 'sub' function (for substitute). With the small edit I made to your function and use of the regex library you should be able to process reasonably large files. – Farmer Joe Jul 17 '17 at 22:55
  • @Jake You can probably rewrite the whole file all at once if you wanted, see the second edit I made. I tested it on a file made of your example repeated over and over on ~25000 lines and it was no problem (~1-2 sec). – Farmer Joe Jul 17 '17 at 23:01
  • wow thanks. This is perfect! Worked like a charm against the txt files (~20,000 lines each) and was very quick – Jake Jul 17 '17 at 23:05
  • @Jake Happy to help! – Farmer Joe Jul 17 '17 at 23:06