5

I am trying to parse an RSS feed. Entries in the feed have date elements like:

<dc:date>2016-09-21T16:00:00+02:00</dc:date>

Using feedparser, I try to do:

published_time = datetime.fromtimestamp(mktime(entry.published_parsed))

But the problem is that I seem to be getting the wrong time stored in the database. In this particular case, the datetime is stored as:

2016-09-21 13:00:00

... when I would expect 14:00 - the correct UTC time.

I assume the problem is in our django settings, where we have:

TIME_ZONE = 'Europe/Berlin'

Because when I switch to:

TIME_ZONE = 'UTC'

... the datatime is stored as correct UTC time:

2016-09-21 14:00:00

Is there any way to keep the django settings as they are, but to parse and store this datetime correctly, without the django timezone setting affecting it?

EDIT: Maybe it's more clear like this...

print entry.published_parsed
published_time = datetime.fromtimestamp(mktime(entry.published_parsed))
print published_time
localized_time = pytz.timezone(settings.TIME_ZONE).localize(published_time, is_dst=None)
print localized_time

time.struct_time(tm_year=2016, tm_mon=9, tm_mday=21, tm_hour=14, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=265, tm_isdst=0)
2016-09-21 15:00:00
2016-09-21 15:00:00+02:00
apiljic
  • 527
  • 4
  • 14
  • 1
    Are you interested in a time zone conversion or would you be open to simply adding an hour with a datetime.timedelta operation? – JwM Dec 14 '15 at 18:15
  • Ultimately, I'd like to have the correct time in UTC. Taking an hour away now (two hours in day saving period) may be a way to go. I haven't looked at it yet though. I was wondering if there was another way. I tried for instance timezone.activate() and timezone.deactivate() which seemed to change the current_timezone in the right way, but that didn't fix the problem. – apiljic Dec 14 '15 at 19:03
  • You can make a datetime aware, or change the timezone if it's already aware but wrong. – Lorenzo Peña Dec 14 '15 at 20:35

3 Answers3

2

feedparser's entry.published_parsed is always a utc time tuple whatever the input time string is. To get timezone-aware datetime object:

from datetime import datetime

utc_time = datetime(*entry.published_parsed[:6], tzinfo=utc)

where utc is a tzinfo object such as datetime.timezone.utc, pytz.utc, or just your custom tzinfo (for older python versions).

You shouldn't pass utc time to mktime() that expects a local time. Same error: Have a correct datetime with correct timezone.

Make sure USE_TZ=True so that django uses aware datetime objects everywhere. Given a timezone-aware datetime object, django should save it to db correctly whatever your TIME_ZONE or timezone.get_current_timezone() are.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

Have you tried using datetime.utcfromtimestamp() instead of datetime.fromtimestamp()?

As a secondary solution, you can get the unparsed data (I believe it's available as entry.published?) and just use python-dateutil to parse the string, then convert it to pytz.utc timezone like this.

>>> import pytz
>>> from dateutil import parser
>>> dt = parser.parse('2016-09-21T16:00:00+02:00')
>>> dt
datetime.datetime(2016, 9, 21, 16, 0, tzinfo=tzoffset(None, 7200))
>>> dt.astimezone(pytz.utc)
datetime.datetime(2016, 9, 21, 14, 0, tzinfo=<UTC>)
James J.
  • 216
  • 1
  • 5
  • time.struct_time(tm_year=2016, tm_mon=9, tm_mday=21, tm_hour=14, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=265, tm_isdst=0) 2016-09-21 13:00:00 2016-09-21 13:00:00+00:00 ... This is the output of utcfromtimestamp(). The timezone is changed, but the time is still not correct. – apiljic Dec 14 '15 at 23:16
  • Second solution could work. My only concern is that there are many different date formats. From what we encountered so far, feedparser didn't have a problem with any of them. I am wondering if the parser you suggest works equally well. Do you use it for many different date formats? – apiljic Dec 15 '15 at 00:30
  • 1
    @apiljic: use feedparser to parse input time strings (`_parsed` attributes). `dateutil` accepts too many input time formats and therefore may return a wrong result silently. – jfs Dec 15 '15 at 15:23
1

Use

published_time = pytz.utc.localize(datetime.utcfromtimestamp(calendar.timegm(parsed_entry.published_parsed)))

Feedparser can parse a large range of date formats, you can find them here.

As you can see in feedparser/feedparser/datetimes/__init__.py, the built-in function from Feedparser _parse_date does the following:

Parses a variety of date formats into a 9-tuple in GMT

This means that in parsed_entry.published_parsed you have a time.struct_time object in GMT timezone.

When you convert it to a datetime object using

published_time = datetime.fromtimestamp(mktime(parsed_entry.published_parsed))

the problem is that mktime assumes that the passed tuple is in local time, which is not, it's GMT/UTC! Other than that you don't properly localize the datetime object at the end of the conversion.

You need to replace that conversion with the following, remembering that Feedparser returns a GMT struct_time, and localize that with the timezone you like (UTC for the sake of simplicity).

  • You use calendar.timegm, which gives the number of seconds between epoch and the date passed as a parameter, assuming that the passed object is in UTC/GMT (we know from Feedparser it is)
  • You use utcfromtimestamp to obtain a naive datetime object (which we know represents a datetime in UTC, but Python does not at this moment)
  • With pytz.utc.localize you properly localize in UTC the datetime object.

Example:

import calendar
from datetime import datetime
import pytz
localized_dt = pytz.utc.localize(datetime.utcfromtimestamp(calendar.timegm(parsed_entry.published_parsed)))

As long as you are consistent, it doesn't matter if you use fromtimestamp or utcfromtimestamp. If you use fromtimestamp you need to tell Python that the datetime object you created has the local timezone. Supposing you are in Europe/Berlin, this is also fine:

pytz.timezone('Europe/Berlin').localize(datetime.fromtimestamp(calendar.timegm(parsed_entry.published_parsed)))

Were parsed_entry.published_parsed also in local timezone, mktime must be used in place of calendar.timegm.

As an alternative you can parse yourself the data string you get from Feedparser parsed_entry['published']

from dateutil import parser
localized_dt = parser.parse(parsed_entry['published'])

You can check that the following returns True:

parser.parse(parsed_entry['published']) == pytz.utc.localize(datetime.utcfromtimestamp(calendar.timegm(parsed_entry.published_parsed)))

The Django TIME_ZONE setting doesn't actually matter, because it's used only for visualization purposes or to automatically convert naive datetimes.

When USE_TZ is True, this is the default time zone that Django will use to display datetimes in templates and to interpret datetimes entered in forms.

What is important is to always use properly localized datetimes, no matter which time zone is used. As long as they are not in naive format, they will be properly handled by Django.

andrea.ge
  • 1,937
  • 1
  • 18
  • 27
  • it is unnecessary complicated. Here's a [simpler solution](http://stackoverflow.com/a/34292796/4279) – jfs Dec 15 '15 at 15:26
  • I agree, you need this complication when you need to consider the dst flag, which is the case for a local time (that is where you use mktime) and not for UTC, which hasn't it. – andrea.ge Dec 15 '15 at 15:59
  • if the time is not UTC then the code is not merely complicated; it is just wrong. – jfs Dec 15 '15 at 16:01
  • What is wrong with `local_tz.localize(datetime.fromtimestamp(mktime(x)))`? – andrea.ge Dec 15 '15 at 16:03
  • there are several things wrong but I'm talking about the very first code example in your answer (with `timegm()`). – jfs Dec 15 '15 at 16:04
  • Using calendar.timegm() solved the problem for me. Thanks! – apiljic Dec 15 '15 at 16:36
  • @J.F. Sebastian: Could you shortly explain why one shouldn't use timegm()? – apiljic Dec 15 '15 at 16:45
  • You don't need it (as my answer shows) if the input is UTC. And it is wrong if the input is not UTC. – jfs Dec 15 '15 at 16:46