1

I'm using Openpyxl to iterate over the values of a worksheet column, where there are values like this:

192:45:49

This should be a simple string and considered as such (I need to extract with a re.match() the initial "192"), but Python - as I can see using type(cell.value) - thinks it's a datetime object:

for col_cells in ws.iter_cols(min_col=3, max_col=3):
   for cell in col_cells:
       print type(cell.value)

type 'datetime.datetime'

I have no control over the initial file, so how can I consider it as a simple string? I find plenty of documentation on how to "convert" datetime objects to strings, but that's not what I'm after.

CharlesM
  • 521
  • 1
  • 7
  • 16

3 Answers3

0

Have you tried wrapping you the value in str(value) before you do the regex search? Casting it explicitly as a string could help. Are you iterating over each cell to use regex to match? do all the cells have this issue or is it just this one cell in particular?

Drew Gillies
  • 53
  • 1
  • 4
  • 10
  • Tried to use str(cell.value), but I only get some random date, like "1900-01-08 03:48:25". I'm iterating over each cell and they all have this issue. – CharlesM Sep 12 '19 at 20:59
0

An indication is given on reddit:

If you want to verify that the date format is what you expect, then you'll want to check the openpyxl cell's number_format not the cell's value.

This should give you the number formats for your cells:

for col_cells in ws.iter_cols(min_col=3, max_col=3):
   for cell in col_cells:
       print type(cell.number_format)

You could then try to change that number format: Openpyxl setting number format

As a last resort, if you cannot find a decent solution to prevent this misinterpretation, you could try to convert your number back. Dates are essentially numbers of days since the 1/1/1900 at 00:00 and the rest is the fraction of the day (to be converted in hours, minutes and seconds). On how to do this in Python: How to convert a given ordinal number (from Excel) to a date.

fralau
  • 3,279
  • 3
  • 28
  • 41
  • Thanks for the suggestions. type(cell.number_format) is always , before and after setting a number_format value, while cell.number_format is [H]:MM:SS before setting it to whatever (like '0') ... But, still, when printed it gets treated as a date. I'll try the "convert back" idea – CharlesM Sep 13 '19 at 07:21
  • If it appears really as a bug or strange behaviour, perhaps you would want to post it as an issue on https://bitbucket.org/openpyxl/openpyxl/issues ? (They have quite a few [questions on dates](https://bitbucket.org/openpyxl/openpyxl/issues?q=date)) – fralau Sep 13 '19 at 08:03
0

Unfortunately, the way Excel handles dates and times is very poor. However, openpyxl doesn't just use heuristics to guess what's intended but relies on the formatting applied. In this case it's quite clear that hour:minute:second has been applied so the conversion is correct.

If you want do anything else then just convert the time to a string and reassign it and set the number format to None.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55