0

I'm trying to change all date values in an XML and subsequently add or subtract an user specified amount of time from the time stamps.

The timestamps are all of the format 2016-06-29T17:03:39.000Z However, they are not all enclosed in the same tags

My XML looks something like this:

<Id>2016-06-29T17:03:37.000Z</Id>
<Lap StartTime="2016-06-29T17:03:37.000Z">
<TotalTimeSeconds>6906</TotalTimeSeconds>
<DistanceMeters>60870.5</DistanceMeters>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2016-06-29T17:03:37.000Z</Time>

I want to run through the XML file row by row, and search for the date/time string, then first find and replace the date, secondly add/subtract some amount of time from the timestamp.

This is my code so far:

import re
import xml.etree.ElementTree as et

name_file = 'test.txt' 
fh = open(name_file, "r")
filedata = fh.read()
fh.close()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    print cur_date

Does anyone have an idea on how to do this?

4 Answers4

0

You can use this:

(?P<YEAR>[\d]{4})-(?P<MONTH>([0][1-9])|([1][0-2]))-(?P<DAY>([0][1-9])|([12][0-9])|([3][01]))T(?P<HOUR>([01][0-9])|([2][0-3])):(?P<MINUTES>([0-5][0-9])):(?P<SECONDS>([0-5][0-9])).(?P<MILLIS>[0-9]{3})Z

And then you can access the named groups like this:

cur_date.group('YEAR')

P.S. You can see live demo here: https://regex101.com/r/mA1rY4/1

Maria Ivanova
  • 1,146
  • 10
  • 19
  • I really like the simplicity of this solution, although, I don't know how i would be able to add/subtract any amount of time here. For instance, if I want to add 55 minutes, sometimes the hours will change, at other points they will not. How will this work when is split the timestamp? – Max_the_Roos Jun 30 '16 at 14:27
  • I am not really well-versed in Python, but, using `cur_date.group('YEAR')`, `cur_date.group('MONTH')`, etc. and converting the values to integers, you should be able to calculate the time. Perhaps the datetime object could also help. Here is more info: [http://www.saltycrane.com/blog/2009/05/converting-time-zones-datetime-objects-python/](http://www.saltycrane.com/blog/2009/05/converting-time-zones-datetime-objects-python/) Also, in the replace function of regex, you can manipulate the datetime in any desirable format. (`${YEAR}` will give you the year and so on). – Maria Ivanova Jun 30 '16 at 14:36
  • This here could also help: [http://stackoverflow.com/questions/8777753/converting-datetime-date-to-utc-timestamp-in-python](http://stackoverflow.com/questions/8777753/converting-datetime-date-to-utc-timestamp-in-python) – Maria Ivanova Jun 30 '16 at 14:39
0

use this regex to find the all date :

\d{4}[-/]\d{2}[-/]\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2[-/]\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z', line)
    print cur_date
    for match in cur_date
        line.replace(match,updateDate(match))

you just need to create a updateDate() fonction who do the update you want In this function you can use the same regex but this time with matching groups e.g. ().

I think is easier to split the work in two part

baddger964
  • 1,199
  • 9
  • 18
  • replacement of the date worked great! Thank you, however, manipulating the time is still tricky. Also, do you by any chance have advice on how to best insert it back in the XML? – Max_the_Roos Jun 30 '16 at 14:42
0

Assuming we can ignore that the timestamps are embedded in XML in this case, you could adjust them using re.sub():

#!/usr/bin/env python2
import datetime as DT
import fileinput
import re

timestamp_regex = '(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2}).(\d{3})Z'

def add_two_days(m):
    numbers = map(int, m.groups())
    numbers[-1] *= 1000  # milliseconds -> microseconds
    try:
        utc_time = DT.datetime(*numbers)
    except ValueError:
        return m.group(0) # leave an invalid timestamp as is
    else:
        utc_time += DT.timedelta(days=2) # add 2 days
        return utc_time.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

replace_time = re.compile(timestamp_regex).sub
for line in fileinput.input('test.xml', backup='.bak', inplace=1, bufsize=-1):
    print replace_time(add_two_days, line),

To make working with the timestamps easier, they are converted to datetime objects. You can adjust the time using timedelta() here.

fileinput.input(inplace=1) changes the input file inplace (print prints to the file in this case). The backup file is copied to a file with the same name and the appended .bak file extension. See How to search and replace text in a file using Python?

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

I finally solved the issue with the following code (it might not be 100% optimal, but it works..):

import re
import xml.etree.ElementTree as et
import datetime

name_file = 'test.gpx' #raw_input("Naam van file incl .txt op het einde: ")
nieuwe_datum = '2016-06-30' #raw_input("Nieuwe datum format YYYY-MM-DD: ")
new_start_time = '14:45:00' #raw_input("Start tijdstip format hh:mm:ss : ")
new_start_time = datetime.datetime.strptime(new_start_time, "%H:%M:%S")
fh = open(name_file, "r")
filedata = fh.read()
fh.close()
outfile = open('output.gpx', 'w')

time_list = list()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    for match1 in cur_date:
        line = line.replace(match1, nieuwe_datum)
    cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
    for match in cur_time:
    time_list.append(match)
cur_start_time = min(time_list)
print 'current start time: '
print cur_start_time
print 'new start time: '
print new_start_time
cur_start_time = datetime.datetime.strptime(cur_start_time, "%H:%M:%S.%f")
if cur_start_time > new_start_time:
    time_dif = (cur_start_time - new_start_time)
    print 'time difference is: ' 
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time - time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
else:
    time_dif = new_start_time - cur_start_time
    print 'time difference is: '
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time + time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
print 'Nieuwe start datum is: '
print nieuwe_datum
outfile.close()