0

I'm importing data from an Excel spreadsheet into python. My dates are coming through in a bizarre format of which I am not familiar and cannot parse.

in excel: (7/31/2015)

42216

after I import it:

u'/Date(1438318800000-0500)/'

Two questions:

  1. what format is this and how might I parse it into something more intuitive and easier to read?
  2. is there a robust, swiss-army-knife-esque way to convert dates without specifying input format?
rvictordelta
  • 630
  • 2
  • 8
  • 23
  • Without code it can't be see where appear problems. However, take a look to some related questions like http://stackoverflow.com/questions/1108428/how-do-i-read-a-date-in-excel-format-in-python. – Mihai8 Jul 30 '15 at 22:55
  • 1
    I think that's how excel stores dates - the number before the minus is most likely seconds since the epoch (1st Jan 1970) and the -0500 bit a timezone. – SteJ Jul 30 '15 at 22:56
  • Excel stores dates in that format. It is not stored as a string (the way it is displayed). –  Jul 30 '15 at 23:15
  • Incidentally (and this is pedantic) the format displayed bears no resemblance to how Excel stores dates. Excel tends to store dates as the number of days from 1st Jan 1900 plus some cell formatting information that makes it appear as a date, though the 2010 XML formats do allow ISO8601 formatting. – ig0774 Jul 31 '15 at 00:02

1 Answers1

0

Timezones necessarily make this more complex, so let's ignore them...

As @SteJ remarked, what you get is (close to) the time in seconds since 1st January 1970. Here's a Wikipedia article how that's normally used. Oddly, the string you get seems to have a timezone (-0500, EST in North America) attached. Makes no sense if it's properly UNIX time (which is always in UTC), but we'll pass on that...

Assuming you can get it reduced to a number (sans timezone) the conversion into something sensible in Python is really straight-forward (note the reduction in precision; your original number is the number of milliseconds since the epoch, rather than the standard number of seconds from the epoch):

from datetime import datetime

time_stamp = 1438318800

time_stamp_dt = datetime.fromtimestamp(time_stamp)

You can then get time_stamp_dt into any format you think best using strftime, e.g., time_stamp_dt.strftime('%m/%d/%Y'), which pretty much gives you what you started with.

Now, assuming that the format of the string you provided is fairly regular, we can extract the relevant time quite simply like this:

s = '/Date(1438318800000-0500)/'
time_stamp = int(s[6:16])
ig0774
  • 39,669
  • 3
  • 55
  • 57
  • getting an out of range error on your third line: _print datetime.fromtimestamp(time_stamp)_ **ValueError: timestamp out of range for platform localtime()/gmtime() function** – rvictordelta Jul 30 '15 at 23:26
  • Admittedly I don't have Python 2 available to test with, but try the latest version. I had to edit because I initially assumed it was just an epoch date. – ig0774 Jul 30 '15 at 23:45
  • Ah. My problem was that my original number is in nanosecs since the epoch.. you fed datetime milliseconds. If you make that clear in your response I'll select your answer. – rvictordelta Jul 31 '15 at 18:35
  • @rvictordelta: I've clarified it a bit, but I thought I made that clear in my answer? See the third paragraph... Btw, your original number is the number of milliseconds from the epoch whereas `fromtimestamp` wants the number of seconds since the epoch. – ig0774 Jul 31 '15 at 18:49