The problem
According to this issue, the problem is a result of Excel's implementation; not OpenPyXL's:
The XLSX file format cannot be relied upon to store dates and times accurately. These are always converted internally to a serial form of limited precision.
Work-around option 1: Text
You could represent the values as text in Excel and then parse that text. For example...
Say you have a Date value in cell A2 with the custom format "m/d/yy h:mm:ss":
8/16/19 6:55:00
In a separate column, you can write the formula
=TEXT(A2,"yyyy-mm-dd")&"T"&TEXT(A2,"hh:mm:ss")
and copy to other cells in the column, resulting in
2019-08-16T06:55:00
If you don't want this separate column, just overwrite column A by copying and pasting the values from the new text column into it.
Now, when you read the values you'll need to be careful to use the option data_only=True
with openpyxl.load_workbook
so you read the actual value of the text; not the formula.
import openpyxl
wb = openpyxl.load_workbook('doc.xlsx', data_only=True)
ws = wb.active
for rowNum in range(2, ws.max_row + 1):
date = ws['A' + str(rowNum)].value
print(date)
Then parse the text in date
according to the ISO 8601 format.
Work-around option 2: Rounding
One drawback to the text option is that it requires more work in the Excel spreadsheet. If you know the precision you're going for, you can have Python round the values for you.
Caution should be taken, however! If the desired precision is too small, the lack of precision on Excel's part could conceivably result in rounding to incorrect numbers. This could be especially true of compounding formulas, such as =<above cell>+1/24
.
See this question (and particularly this answer) for a good solution to rounding datetimes. Here's a spin-off of solution I used for my own code (note f-string use; you'll need to use str.format()
if you're using Python < 3.6):
import datetime as dt
def round_datetime(d, precision='second'):
"""Round a datetime to the named precision.
Cf. https://stackoverflow.com/a/3464000/7232335
"""
d += dt.timedelta(**{f"{precision}s": 1}) / 2
d -= dt.timedelta(minutes=d.minute,
seconds=d.second,
microseconds=d.microsecond)
return d # Return optional, since object d is modified in-place.
Side note:
There are Worksheet methods available for iterating through rows with OpenPyXL:
1. ws.iter_rows()
for row in ws.iter_rows(min_row=2, values_only=True):
date = row[0]
2. ws.values
Same as ws.iter_rows()
with no arguments except values_only=True
.
values = ws.values
next(values) # Discard first row.
for row in values:
date = row[0]
3. ws.rows
Same as ws.iter_rows()
with no arguments.
rows = ws.rows
next(rows) # Discard first row.
for row in ws.rows:
date = row[0].value