138

I am using openpyxl to read a cell value (excel addin-webservice updated this column).

I have used data_only = True but it is not showing the current cell value, instead it is the value stored the last time Excel read the sheet.

wbFile = openpyxl.load_workbook(filename = xxxx, data_only=True)
wsFile = wbFile[c_sSheet]

How can I read the actual cell value?

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
user3411047
  • 1,425
  • 2
  • 9
  • 14
  • 9
    I don't think `openpyxl` replicates Excel's huge formula-evaluation subsystem, so I think you just can't have it automatically recompute all formulas. Best case, you can implement your own `eval`-based interpreter for a subset of that large formula language. – Alex Martelli Feb 14 '15 at 16:55
  • i dont need compute or execute forumula. i just need to read the cell value. – user3411047 Feb 14 '15 at 17:05
  • 5
    "The cell value" (beyond the formula) is what Excel saved when it saved the sheet and you've implied it's not really what you want -- you want the "current" value (depending no doubt on values in other cells) and that means you *do* need to recompute the formula based on other cells' current values! – Alex Martelli Feb 14 '15 at 17:10
  • sorry to confuse you, Bacially my column is upated via an excel add in (webservice will poll data )... and i need to extract that column cell values. – user3411047 Feb 14 '15 at 17:55
  • 6
    I think you need to either (A) involve Excel again to get the formulas recomputed, either via said add-in or maybe http://xlwings.org/ and the like; or (B) if the formulas are simple enough implement your own formula interpreter as I said. `openpyxl` and other Excel-free readers of Excel files will not do the formula-evaluation that you need (whether you think you need it or not:-). – Alex Martelli Feb 14 '15 at 18:30
  • Does this answer your question? [How to access the real value of a cell using the openpyxl module for python](https://stackoverflow.com/questions/22613272/how-to-access-the-real-value-of-a-cell-using-the-openpyxl-module-for-python) – Josh Correia Aug 01 '23 at 05:59

10 Answers10

262
wb = openpyxl.load_workbook(filename, data_only=True)

The data_only flag helps.

Gábor Erdős
  • 3,599
  • 4
  • 24
  • 56
Marcin Kajzler
  • 2,698
  • 1
  • 9
  • 7
  • 8
    Yes. This answers the question. If you want access both, equation and value then you might endup in two instances, say `wb2 = openpyxl.load_workbook(filename)`. Then on reading corresponding index, you get value from `wb` and equation from `wb2`. – Ajeeb.K.P Apr 18 '16 at 14:07
  • 1
    hi, the cell wasn't empty. It is because openpyxl does not evaluate formulae. I went around this by taking the excel data and putting all the computation on the servers :| – galeej Feb 10 '18 at 08:20
  • It might be worth adding an Issue to the official feature tracker at https://bitbucket.org/openpyxl/openpyxl/issues but judging by https://bitbucket.org/openpyxl/openpyxl/issues/291/reading-cell-values-with-formula-is-not , I suspect it's an intentionally-avoided feature: Edits to the formula would be difficult to synchronize with the computed value. You'd have to re-implement the entire formula-evaluation system. That might be a copyright violation, require an Excel (or OpenOffice) license, or restrict the workbook to effectively read-only. – Sarah Messer Dec 26 '19 at 16:41
  • It doesn’t answer the initial question? Using xlwings package is a solution in this case. – Dilshat Jan 24 '20 at 18:38
  • 4
    With data_only = True, cell.value will return the value ( as last know by Excel when the file was written). cell.internal_value will return the formula. – Burtski Apr 08 '21 at 18:47
  • @SarahMesser FYI: issue #291 is now tracked here: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/291 – Jean-Francois T. May 20 '22 at 07:05
  • 2
    I got some weird behavior: I get `None` for both `cell.value` or `cell.internal_value` when using `data_only=True` and trying to read a cell with a formula (works fine for static cells or without `data_only=True`). Any idea? – Jean-Francois T. May 20 '22 at 07:09
  • I don't see how this answers the question, OP already has `data_only=True` in their code. – Josh Correia Aug 01 '23 at 05:48
21

As @alex-martelli says, openpyxl does not evaluate formulae. When you open an Excel file with openpyxl you have the choice either to read the formulae or the last calculated value. If, as you indicate, the formula is dependent upon add-ins then the cached value can never be accurate. As add-ins outside the file specification they will never be supported. Instead you might want to look at something like xlwings which can interact with the Excel runtime.

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • 2
    Two questions on this answer: (1) how do you differentiate between opening the XL file to read forumulae v.s. reading the last calculated value? Is this via the `data_only=True` param? (2) What does `last calculated value` mean in the real world? That is, if the XL file was saved on exit after last (manual/human) changes, does that mean all cells were recalc'd / at their "last calculated value"? When are cell values re-calculated, typically? *(I realize that is more an Excel question than OpenPyXL, but would be grateful for the clarification)* – cssyphus Aug 20 '18 at 15:09
  • 2
    Nevermind, I think I found the answers here: https://stackoverflow.com/questions/36116162/python-openpyxl-data-only-true-returning-none – cssyphus Aug 20 '18 at 15:16
16

data_only : read values for even for the formula cells.

keep_vba: it's used only if you are using macroenabled excel

file_location = 'C:\Arpan Saini\Monsters\Project_Testing\SecCardGrad\SecCardGrad_Latest_docs\Derived_Test_Cases_Secure_Card_Graduate.xlsm'
wb = load_workbook(file_location, keep_vba=True, data_only=True)
Arpan Saini
  • 4,623
  • 1
  • 42
  • 50
12

As @Charlie Clark mentioned you could use xlwings (if you have MS Excel). Here an example

say you have an excel sheet with formulas, for the example I define one with openpyxl

from openpyxl import Workbook, load_workbook
wb=Workbook()

ws1=wb['Sheet']

ws1['A1']='a'
ws1['A2']='b'
ws1['A3']='c'

ws1['B1']=1
ws1['B2']=2
ws1['B3']='=B1+B2'

wb.save('to_erase.xlsx')

As mentioned, if we load the excel again with openpyxl, we will not get the evaluated formula

wb2 = load_workbook(filename='to_erase.xlsx',data_only=True)
wb2['Sheet']['B3'].value

you can use xlwings to get the formula evaluated by excel:

import xlwings as xw
wbxl=xw.Book('to_erase.xlsx')
wbxl.sheets['Sheet'].range('B3').value

which returns 3, the expected value.

I found it quite useful when working with spreadsheets with very complicated formulas and references between sheets.

Nabla
  • 1,509
  • 3
  • 20
  • 35
  • 1
    Does only work on Windows as it seems. xlwings does not install on Linux. – Juergen May 01 '19 at 20:20
  • 2
    indeed, the formulas are evaluated by MS excel. I think this solution will only work in machines with this software (the library should work on mac though) – Nabla May 03 '19 at 08:34
8

Faced the same problem. Needed to read cell values whatever those cells are: scalars, formulae with precomputed values or formulae without them, with fail-tolerance preferred over correctness.

The strategy is pretty straightforward:

  1. if a cell doesn't contain formula, return cell's value;
  2. if it's a formula, try to get its precomputed value;
  3. if couldn't, try to evaluate it using pycel;
  4. if failed (due to pycel's limited support of formulae or with some error), warn and return None.

I made a class which hides all this machinery and provides simple interface for reading cell values.

It's easy to modify the class so that it will raise an exception on step 4, if correctness is preferred over fail-tolerance.

Hope it will help someone.

from traceback import format_exc
from pathlib import Path
from openpyxl import load_workbook
from pycel.excelcompiler import ExcelCompiler
import logging


class MESSAGES:
    CANT_EVALUATE_CELL = ("Couldn't evaluate cell {address}."
                          " Try to load and save xlsx file.")


class XLSXReader:
    """
    Provides (almost) universal interface to read xlsx file cell values.

    For formulae, tries to get their precomputed values or, if none,
    to evaluate them.
    """

    # Interface.

    def __init__(self, path: Path):
        self.__path = path
        self.__book = load_workbook(self.__path, data_only=False)

    def get_cell_value(self, address: str, sheet: str = None):
        # If no sheet given, work with active one.
        if sheet is None:
            sheet = self.__book.active.title

        # If cell doesn't contain a formula, return cell value.
        if not self.__cell_contains_formula(address, sheet):
            return self.__get_as_is(address, sheet)

        # If cell contains formula:
        # If there's precomputed value of the cell, return it.
        precomputed_value = self.__get_precomputed(address, sheet)
        if precomputed_value is not None:
            return precomputed_value

        # If not, try to compute its value from the formula and return it.
        # If failed, report an error and return empty value.
        try:
            computed_value = self.__compute(address, sheet)
        except:
            logging.warning(MESSAGES.CANT_EVALUATE_CELL
                            .format(address=address))
            logging.debug(format_exc())
            return None
        return computed_value                

    # Private part.

    def __cell_contains_formula(self, address, sheet):
        cell = self.__book[sheet][address]
        return cell.data_type is cell.TYPE_FORMULA

    def __get_as_is(self, address, sheet):
        # Return cell value.
        return self.__book[sheet][address].value

    def __get_precomputed(self, address, sheet):
        # If the sheet is not loaded yet, load it.
        if not hasattr(self, '__book_with_precomputed_values'):
            self.__book_with_precomputed_values = load_workbook(
                self.__path, data_only=True)
        # Return precomputed value.
        return self.__book_with_precomputed_values[sheet][address].value

    def __compute(self, address, sheet):
        # If the computation engine is not created yet, create it.
        if not hasattr(self, '__formulae_calculator'):
            self.__formulae_calculator = ExcelCompiler(self.__path)
        # Compute cell value.
        computation_graph = self.__formulae_calculator.gen_graph(
            address, sheet=sheet)
        return computation_graph.evaluate(f"{sheet}!{address}")
John Y
  • 14,123
  • 2
  • 48
  • 72
krvkir
  • 771
  • 7
  • 12
  • You mentioned `pycel`. What a brilliant idea! – Rockallite Dec 15 '18 at 04:26
  • @Rockallite One should know that `pycel` supports only limited set of Excel functions. But for simple cases it works fine. – krvkir Dec 16 '18 at 08:58
  • This has a few issues: TYPE_FORMULA should be from `from openpyxl.cell.cell import TYPE_FORMULA`, gen_graph() is no longer available - use evaluate() directly. – advance512 Aug 07 '21 at 15:43
7

I solved this problem by the following way:

import xlwings
from openpyxl import load_workbook

data = load_workbook('PATH_TO_YOUR_XLSX_FILE')
data['sheet_name']['A1'].value = 1
data.save('PATH_TO_YOUR_XLSX_FILE')

excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('PATH_TO_YOUR_XLSX_FILE')
excel_book.save()
excel_book.close()
excel_app.quit()

data = load_workbook('PATH_TO_YOUR_XLSX_FILE', data_only=True)

I hope, this can help You...

Alexey Korolkov
  • 181
  • 2
  • 5
1

Instead on openpyxl, use xlwings.

Ayush
  • 373
  • 4
  • 12
0

I found data_only option is not working properly if there is an "REF!" error cell in a worksheet. Openpyxl returns None for each cell value in my tiny test xlsx file. For me, after opening Excel and fixing the cell, data_only works perfectly. I use openpyxl 3.0.3

0

Rather than use a Python library to do the Excel calculations, I have Excel do them.

Why? It's not pure Python, but it minimizes the amount of Python involved. Instead of using Python to evaluate the Excel formulas, I let Excel handle its own functionality. This avoids any possible bugs in the Python that evaluates the Excel formulas. Here's an outline of how this approach works:

  1. Call openpyxl with data_only=False to edit and then save the spreadsheet.
  2. Use subprocess.Popen to open the new spreadsheet in Excel, and let Excel evaluate the spreadsheet formulas.
  3. Use pynput.keyboard to save the updated spreadsheet and exit Excel.
  4. Use openpyxl with data_only=True to open the updated spreadsheet and get the values of the formulas.

Here is a test program for Windows that creates a new workbook, puts the formula "=SUM(Al:C3)" in cell E2, puts data into cells A1-C3, and evaluates the formula.

from openpyxl import load_workbook, Workbook
from pynput.keyboard import Key, Controller
import subprocess
import time
import os

excel_prog = r'C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE'

# Create test Excel workbook, get default worksheet.
wb = Workbook()
ws = wb.active

# Put data and a formula into worksheet.
for row_index in range(1,4):
    for column_index in range(1,4):
        ws.cell(row = row_index, column = column_index).value = row_index + column_index
ws['E1'].value = 'Sum of cells in range A1:C3:'
ws['E2'].value = '=SUM(A1:C3)'

# Try to get value of formula.  We'll see the formula instead.
print('E2:', ws['E2'].value)

# Save and close workbook.
wb.save(filename = 'test.xlsx')
wb.close()

# Pause to give workbook time to close.
time.sleep(5)

# Open the workbook in Excel.  I specify folder, otherwise Excel will
# open in "Protected View", interfering with using pynput.
subprocess.Popen([excel_prog, os.path.join(os.getcwd(), 'test.xlsx')])

# Pause to give workbook time to open and for formulas to update.
time.sleep(5)

# Save workbook using pynput.
keyboard = Controller()
with keyboard.pressed(Key.ctrl):
    keyboard.press('s')
    keyboard.release('s')

# Pause to give workbook time to save.
time.sleep(5)

# Close workbook.
with keyboard.pressed(Key.alt):
    keyboard.press(Key.f4)
    keyboard.release(Key.f4)

# Pause to give workbook time to fully close.
time.sleep(5)

# Open Excel workbook and worksheet in openpyxl, data-only.
wb = load_workbook(filename = 'test.xlsx', data_only = True)
ws = wb.active

# Get value of the cell containing the formula.
print('E2:', ws['E2'].value)

# Close workbook.
wb.close()
-1

Xlcalculator has the ability to evaluate a cell.

from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator

filename = r'xxxx.xlsm'
compiler = ModelCompiler()
new_model = compiler.read_and_parse_archive(filename)
evaluator = Evaluator(new_model)
val1 = evaluator.evaluate('First!A2')
print("value 'evaluated' for First!A2:", val1)

The output is:

value 'evaluated' for First!A2: 0.1

bradbase
  • 409
  • 6
  • 9