0

I have download an RSS file and saved as city.txt.

Then I have to grab the date from the <lastBuildDate> tag.

The date is in the format: Fri,28 Aug 2020 and then I have to translate the day and month all using RegEx.

I have managed to get the date but I have problem changing the date and month after I have found it.

Do I have to use re.sub?

My code:

import re
with open('city.txt', 'r', encoding = 'utf-8') as f:
    txt = f.read()
    tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
    found = re.findall(tag_pattern, txt, re.I)
    found = list(set(found))
    for f in found :print('\t\t', f)
marvin
  • 45
  • 1
  • 6
  • You might be able to feed the English date into `datetime.strptime()` and then [re-output it in your language](https://stackoverflow.com/questions/985505/locale-date-formatting-in-python). But otherwise, `re.sub()` seems like the correct method if you're being forced to use regex - after all, you only need to translate the days of the week and names of the months, right? – Green Cloak Guy Aug 29 '20 at 15:35
  • We could help if you better if you add how/what are ou trying to tanslate into..also for fun worth a read. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Equinox Aug 29 '20 at 15:38

3 Answers3

0

Despite it is really not recommended to parse XML content with regexes, your question is actually about date translations.

One approach is to parse the XML content of your RSS file retrieve the text value of the node <lastBuildDate>, then you can parse it and get the value as a datetime object. with datetime.strptime() of datetime package.

The sample below shows of you how to get a datetime objet from a string:

import datetime

# date_time_str contains the date string as formatted in your RSS
date_time_str = 'Fri,28 Aug 2020'
# date_time_obj contains the parsed value (formatted as '%a,%d %b %Y')
date_time_obj = datetime.datetime.strptime(date_time_str, '%a,%d %b %Y')

Then you just have to retrieve the wanted datetime elements as integer. You can display those values in the current locale with the calendar module if it matches your language. Otherwise, a bit more tricky, you can play with TimeEncoding and month_name. (Of course you can write your own translation system.)

Amessihel
  • 5,891
  • 3
  • 16
  • 40
  • Thanks. i dont really need to change the format. but the language. it looks like this:Fri, 28 Aug 2020 17:36:59 GMT and i have to make it look like this Παρ,28 Αυγ 2020 17¨36 GMT – marvin Aug 29 '20 at 16:03
  • @marvin Yeah, actually it's not about to _change_ the format, but to _parse_ it. What you want is to extract date parts to translate them. This is what this approach is meant to do. – Amessihel Aug 29 '20 at 16:08
0

You can use locale in python to display date in Greek or any local language.
Please refer below code, and refer this windows documentation for more locale string options.

import datetime
import locale

input = 'Fri, 28 Aug 2020 17:36:59 GMT'
date_parsed = datetime.datetime.strptime(input, '%a, %d %b %Y %H:%M:%S GMT')

locale.setlocale(locale.LC_TIME, "el-CY")
print(date_parsed.strftime("%a, %d %b %Y %H:%M:%S"))

prints

Ðáñ, 28 Áýã 2020 17:36:59
Liju
  • 2,273
  • 3
  • 6
  • 21
0

I have updated your code based on your requirements, please give it a try.

Code

import re
import locale
import datetime
with open('city.txt', 'r', encoding = 'utf-8') as f:
    txt = f.read()
    tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
    found = re.findall(tag_pattern, txt, re.I)
    found = list(set(found))
    for f in found :
        locale.setlocale(locale.LC_TIME, "en")
        temp=datetime.datetime.strptime(f, '%a, %d %b %Y %H:%M:%S GMT')
        locale.setlocale(locale.LC_TIME, "el-GR")
        print(temp.strftime("%a, %d %b %Y %H:%M:%S"))

Sample input

<lastBuildDate>Fri, 28 Jan 2020 13:32:12 GMT</lastBuildDate>
<lastBuildDate>Sun, 27 Feb 2020 15:36:53 GMT</lastBuildDate>
<lastBuildDate>Mon, 26 Aug 2020 16:30:43 GMT</lastBuildDate>

Ouput

Ôåô, 26 Áõã 2020 16:30:43
Ðåì, 27 Öåâ 2020 15:36:53
Ôñé, 28 Éáí 2020 13:32:12
Liju
  • 2,273
  • 3
  • 6
  • 21