1

I have a spreadsheet with cells containing text with no formulas, those with formulas, and those formatted as datetime. I need to read the actual value for all these three cases.

Using Python 3 with Openpyxl 3.0.3.

Currently, I have the following code. Using data_only=True, as suggested on other SO posts about this kind of problem.

workbook = load_workbook(r"path\to\my\spreadsheet.xlsx", data_only=True)
# select first available sheet
sheet = workbook.active
sheet.title

for value in sheet.iter_rows(min_row=2,
                             max_row=86,
                             min_col=1,
                             max_col=7,
                             values_only=True):
     print(value)

A snippet of the resulting output is shown below. In this example, the first column is fine. The third column is date-formatted, but I need the actual value of the cell (so they would be 26/1/2020 and 27/1/2020). The fifth and sixth columns have formulas, but I need the actual value.

(2, None, datetime.datetime(2020, 1, 26, 0, 0), None, '=SUM(D$3:D4)', '=E4/E3', None)
(3, None, datetime.datetime(2020, 1, 27, 0, 0), None, '=SUM(D$3:D5)', '=E5/E4', None)
Björn
  • 1,610
  • 2
  • 17
  • 37
Al2110
  • 566
  • 9
  • 25

1 Answers1

1

The third column is date-formatted, but I need the actual value of the cell (so they would be 26/1/2020 and 27/1/2020)
As for formatting there is the strfomart
option from the datetime module in python

from datetime import datetime
value[2].strformat("%d/%m/%Y)


Could you validate that the values are imported and not the formulae in the first place, like it's done here
print(sheet["a1"].value)


Edit I created an Excel Test File with three Columns
enter image description here

Whats wierd about it is, that it returns the correct output in your for loop.
The output

(1, 5, 6)
(2, 7, 9)
(3, 9, 12)
(4, 11, 15)

I found the below comment (source)

FWIW, as documented in openpyxl.load_workbook(), the value that you get when opening a Workbook with data_only=True is "the value stored the last time Excel read the sheet". This relies on a caching feature of .xls[x/m/...] files (about which I find documentation everywhere except at Microsoft). – user948581

Maybe try to open the file check if the formulas are correctly calculated (displayed as a value in the cell) and then save the file and make sure it is closed.

Björn
  • 1,610
  • 2
  • 17
  • 37
  • And the cells with the formulas? Currently it returns the formula entered, not the resulting value of the cell. – Al2110 Apr 18 '20 at 07:54
  • I tried validating that the values are imported as you suggested, and it gives a syntax error, without explanation. – Al2110 Apr 18 '20 at 08:00
  • could you try in the loop `value.value` ? – Björn Apr 18 '20 at 08:04
  • it gives an error: 'tuple' object as no attribute 'value' – Al2110 Apr 18 '20 at 08:06
  • Updated the answer. It's really strange for a mockup example I created it worked totally fine. Btw I just validated I use the same version as you `openpyxl 3.0.3` on `python 3.8` – Björn Apr 18 '20 at 08:13
  • I checked the formulas are entered correctly, and the file was saved, and not open when the program is running. – Al2110 Apr 18 '20 at 08:21
  • At this point I am clueless. You could try to create a second test excel file and see if it works there properly – Björn Apr 18 '20 at 08:34
  • @AI2110 ***'tuple' object as no attribute 'value'***: Only a `Cell` object has a `value` attribute. You have to remove `values_only=True` and loop the tuple, which is an a sequence of `Cell` objects, – stovfl Apr 18 '20 at 09:32