0

I'm getting dates from my API in iso format.

When I'm doing:

df = DataFrame(results)
df.to_csv(path_or_buf=file_name, index=False, encoding='utf-8',
          compression='gzip',
          quoting=QUOTE_NONNUMERIC)

And I look at the CSV I see for example:

lastDeliveryDate
2018-11-21 16:25:53.990000-05:00

However,

When I do:

df = DataFrame(results)
df.to_json(path_or_buf=file_name, orient="records",compression='gzip', lines=True)

I see (other record):

"lastDeliveryDate":1543258826689

This is a problem.

When I load the data from the CSV to Google BigQuery eveything is fine. The date is parsed correctly.

But when I changed the loading to Json. It doesn't parse the date correctly.

I see the dates in this format:

50866-01-09 23:46:40 UTC

This occurs because the to_json() and to_csv() produce different results for dates in iso_format

How can I fix this? Must I edit the data frame and convert all my dates columns to regular UTC? how can I do that? and why it's needed for to_json() but not for to_csv() ?

as explained at How do I translate an ISO 8601 datetime string into a Python datetime object? Tried to do:

df["lastDeliveryDate"] = dateutil.parser.parse(df["lastDeliveryDate"])  

But it gives:

TypeError: Parser must be a string or character stream, not Series

Luis
  • 1,305
  • 5
  • 19
  • 46

2 Answers2

2

From the Pandas documentation on to_json():

date_format: {None, ‘epoch’, ‘iso’}
Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

So, with orient="records", you'll have to set date_format="iso" to get a date-time format that can be understood later on:

df.to_json(path_or_buf=file_name, orient="records", date_format="iso", 
           compression='gzip', lines=True)
9769953
  • 10,344
  • 3
  • 26
  • 37
1

Basically dateutil.parser.parse() is expecting a string as a parameter, but you passed the whole column. Try with the lambda function:

df["lastDeliveryDate"] = df["lastDeliveryDate"].apply( lambda row: dateutil.parser.parse(row))
RoyaumeIX
  • 1,947
  • 4
  • 13
  • 37