18

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,

import pandas as pd

df = pd.read_excel(filename, sheetname="Sheet1")

This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?

Colton T
  • 300
  • 2
  • 4
  • 15
  • Is your excel spreadsheet in auto calculation mode? – Zeugma Jan 18 '17 at 15:37
  • Yes formulas are set to auto calculate. – Colton T Jan 18 '17 at 16:48
  • 1
    Weird, is your original file saved by a human or coming from a program before you resale it? – Zeugma Jan 18 '17 at 17:09
  • Saved by a person, we pull in data and then create formulas based on that. But the data we pull in we just copy and pasted into a file and then saved. – Colton T Jan 18 '17 at 18:51
  • I also faced similar issue, however it occurs only for first few rows, for rest it picks up the values processed by the formula. – Sanjoy Oct 29 '18 at 15:32
  • I faced a similar issue. It occurred after writing to the Excel file using the openpyxl engine. That must have somehow corrupted the file. – Christian Jul 20 '20 at 13:00

3 Answers3

11

That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.

In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).

In the second case, you could solve by setting explicit nan values in read_excel:

pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])

As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:

import pandas as pd
import xlwings as xl

def df_from_excel(path):
    app = xl.App(visible=False)
    book = app.books.open(path)
    book.save()
    app.kill()
    return pd.read_excel(path)

df = df_from_excel(path to your file)

If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.

RobatStats
  • 435
  • 3
  • 9
  • 2
    Yeah I knew I could automate the saving, I was just trying to figure out if there was a way around that. The formulas don't point to other files, they are just taking the value from one column and subtracting the sum of values of the other columns, so they just return numbers. – Colton T Jan 20 '17 at 17:22
  • 1
    Writing this in Sept, 2020 - pd.read_excel() now reads Values even when the formulas point to other files. – Abhishek Poojary Sep 03 '20 at 09:18
  • 4
    Writing this in Jan 2021 - pd.read_excel() only reads in values if the notebook has been "opened" hence the need to simulate opening and saving the file with xlwings. As far as I can tell, formulas aren't calculated until the workbook is opened, which is why cells with formulas have no value associated with them when the workbook is only updated by the script – William Daly Jan 25 '21 at 21:53
0

I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.

0

you can use xlrd to read the values. first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below file = myxl.xls

import xlrd
import win32com.client
import os

def refresh_file(file):
    xlapp = win32com.client.DispatchEx("Excel.Application")
    path = os.path.abspath(file)
    wb =  xlapp.Wordbooks.Open(path)
    wb.RefreshAll()
    xlapp.CalculateUntilAsyncqueriesDone()
    wb.save()
    xlapp.Quit()

after the file refresh, you can start reading the content.

workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
    row = worksheet.row(rowid)
    for colid, cell in enumerate(row):
        print(cell.value)

you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

Rajat Tyagi
  • 320
  • 5
  • 9