5

I have an excel file which has 3 columns as either date-time or date or time fields. I am reading it via xlrd package and I am getting time as milliseconds I suppose and when I am trying to convert it back to datetime I am getting wrong results.

I tried with converting the file to csv as well. That too doesn't help and I get weird datetime format which I am not able to comprehend.

Here is what I tried with the xlrd format. I prefer to use files with .xlrs extension as input because otherwise I have to convert excel files to .csv every time I get a new input file.

from xlrd import open_workbook
import os,pickle,datetime

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)

    wb = open_workbook(absolute_filepath)
    for sheet in wb.sheets():
        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols

        for row_index in xrange(1, sheet.nrows):
            row=[]
            for col_index in xrange(4,7): #4th and 6th columns are date fields
                row.append(sheet.cell(row_index, col_index).value)

            print(row)  #Relevant list formed with 4th, 5th and 6th columns
            print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))


path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\\Users\\******************NEW DATA\\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)

Result:

[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50

Actual input file: (copied from excel)

1/7/2015 11:13  Registered  1/5/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/6/2015 0:00
1/7/2015 11:13  Sent back   1/6/2015 0:00
1/7/2015 11:13  Registered  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG2 approval  1/8/2015 0:00
1/7/2015 11:13  CTG2 Approved   1/8/2015 0:00
1/7/2015 11:13  Sent back   1/9/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00

Why am I not able to read dates correctly? Why are they not simply coming up as strings so that I can easily convert them?

Shivendra
  • 1,076
  • 2
  • 12
  • 26

3 Answers3

2

The issue is that you are interpreting the Excel datetime value as a UNIX timestamp, when they are not the same thing. The warning flag to look for is that the resulting values are all near the UNIX epoch (1970-01-01).

You can convert from an Excel datetime to UNIX by using the method described in this answer.

Windows/Mac Excel 2011

Unix Timestamp = (Excel Timestamp - 25569) * 86400

Mac Excel 2007

Unix Timestamp = (Excel Timestamp - 24107) * 86400

If you apply this conversion you should get the correct output:

timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
Community
  • 1
  • 1
mfitzp
  • 15,275
  • 7
  • 50
  • 70
  • Due to lack of time I will just use the solution. But I am still getting incorrect time values. Also is it possible to just get it like what it appears like in the excel file? – Shivendra Jan 11 '16 at 13:02
  • @Shivendra try the answer from @khajvah which uses the `xlrd` API to get the correct values. – mfitzp Jan 11 '16 at 13:04
  • @Shivendra if it is just time values that are incorrect you probably need to adjust the timezone – mfitzp Jan 11 '16 at 13:08
2

xldate_as_tuple(xldate, datemode) [#]

Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.

Source: http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function

Usage Example: How to use ``xlrd.xldate_as_tuple()``

Community
  • 1
  • 1
khajvah
  • 4,889
  • 9
  • 41
  • 63
  • 1
    @Shivendra `dt = xldate_as_tuple(cell.value, wb.datemode)` and then `pydate = datetime.datetime(*dt)` – mfitzp Jan 11 '16 at 13:16
1

If the Excel file to read is a table can be simple and straightforward to use pandas.read_excel. After transforming dates with pandas.to_datetime

from __future__ import absolute_import, division, print_function
import os
import pandas as pd

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)
    #Relevant list formed with 4th, 5th and 6th columns
    df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
    # Transform column 0 and 2 to datetime
    df[0] = pd.to_datetime(df[0])
    df[2] = pd.to_datetime(df[2])
    print(df)

path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)
jrovegno
  • 699
  • 5
  • 11
  • I was unaware of the excel reading capabilities of `pandas`. Thanks for helping even after I had already selected a working answer. This solution is also working after removing headers from the date columns. – Shivendra Jan 13 '16 at 06:30