0

I am trying to get the comment from my xml file, but do not know how. The reason is because currently the xml doesnt have time data, but it is located in the comment. I wanted to grab 18 OCT 2022 1:40:55 PM and convert it into epoch timestamp. Can someone help me?

<!-- My Movies, Generated 18 JAN 2023  1:40:55 PM  --> 
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A scientific fiction</description>
</movie>
</collection>
Nelly Yuki
  • 399
  • 1
  • 4
  • 16
  • What have you tried so far? What are you using to parse the XML data? – larsks Jan 18 '23 at 19:46
  • @larsks I am using python3.9 and I have tried the solution in this post: https://dustinoprea.com/2019/01/22/python-parsing-xml-and-retaining-the-comments/ – Nelly Yuki Jan 18 '23 at 19:47

1 Answers1

1

Unfortunately reading this XML with the regular xml.etree module does not work - as it can start reading only from the root (or any tag "below") - so it skips the first comment.

The solution I'd suggest is reading the file regularly - using the regular:

dates = []

with open('app.xml', 'r') as fp:
   lines = fp.readlines()

for line in lines:
   if line.startswith('<!--'):
      dates.append(line)

Now, in order to detect dates I'd suggest using a regex:

import re
from datetime import datetime

date_format = r'%d %b %Y  %H:%M:%S %p'
date_regex = re.compile(r'\d{2} \w{3} \d{4}  \d{1}:\d{2}:\d{2} \w{2}')

for date in dates:
    extracted_date = re.findall(pattern=date_regex, string=date)[0]
    date_formatted_to_epoch = (datetime.strptime(extracted_date, date_format) - datetime(1970, 1, 1)).total_seconds()
    print(date_formatted_to_epoch)

Which for me outputs:

1674006055.0

Explanations:

CodeCop
  • 1
  • 2
  • 15
  • 37