pandas: remove all formula from xlsx/xls

Question

I use a script that converts a xls file with formula in some cells to a xlsx file. The Problem is that the cells which had formula are now displayed as "0" in the xlsx file.

Is there a way to get rid of formula and get only the value instead? Couldn't find a solution in different threads yet. Also I have no experience with openpyxl which are a lot referring to in this case

import os
import re
import pandas as pd
import glob
import time
import xlwt


   
def setwd():
    from pathlib import Path
    import os
    global home
    global save_dir
    
    home = str(Path.home())
    os.chdir(home + r'\...\...\Staffing Report\Input\MyScheduling\Raw_Data')
    
    latest = home + r'\...\Staffing Report\Input\MyScheduling\Raw_Data'
    
    folders = next(os.walk(latest))[1]
    creation_times = [(folder, os.path.getctime(folder)) for folder in folders]
    creation_times.sort(key=lambda x: x[1])
    
    most_recent = creation_times[-1][0]
    print('test' + most_recent)
    
    os.chdir(latest + '\\' + most_recent + '\\')
    
    print('current cwd is: ' + os.getcwd())
    
    
    save_dir = home + '\...\Input\MyScheduling\PBI\\' + 'Individual_Status.xlsx'
    
    return save_dir
    
def rowdrop():
    global df
    global fdt
    einlesen = os.getcwd()
    print('test einlesen: ' + einlesen)
    
 
    df = pd.read_excel('Individual Status.xls', sheet_name = 'Individual Status Raw Data')
    df = df.iloc[6:]  
    df = df.rename(columns=df.iloc[0]).drop(df.index[0])
    
    fdt = pd.read_excel('Forecast Detail and Totals.xls', sheet_name = 'Forecast',)
    fdt = fdt.iloc[0:]
    #fdt = fdt.drop(['Code'],axis=1)
    
    os.chdir(home + '\...\Input\MyScheduling\PBI')
    
    print('Current CWD before export is: ' + os.getcwd())
    
    df.to_excel('Individual Status.xlsx', index = False)
    
    fdt.to_excel('Forecast Detail and Totals.xlsx', index = False)
    
    return df
    
    
#main

setwd()
rowdrop()

score 0 · Answer 1 · answered Feb 23 '21 at 07:32

0

I'm guessing this is because there aren't actually any values in the cells while Pandas tries to load it in the dataframe. There's another thread talking about a similar case, but is dealt with openpyxl.

Read Excel cell value and not the formula computing it -openpyxl

answered Feb 23 '21 at 07:32

Pv66

136
1
9

Would that mean that I have to rebuild the whole script to use openpyxl? – ChristianP Feb 23 '21 at 07:59
I can see that you're putting in a dataframe into the excel file. So instead of reading the excel into the dataframe as a first step, you can try reading using openpyxl, shouldn't be much of a trouble if it helps you resolve the unnecessary '0' appearing in your file. – Pv66 Feb 23 '21 at 10:33
However one more thread which looks doable in your case is straight up use the dataframe so that it can be used by an openpyxl object. Which can be found here [link](https://stackoverflow.com/questions/36657288/copy-pandas-dataframe-to-excel-using-openpyxl) – Pv66 Feb 23 '21 at 10:35
The stupid thing is that I cant open the xls file with openpyxl?! Otherwise I could apply your solution – ChristianP Feb 23 '21 at 14:55
What do you mean by you cannot open the file? Openpyxl is extensively used for this use case. Anyway, try this: `from openpyxl import load_workbook` `workbook = load_workbook('/path/to/file.xls')` – Pv66 Feb 24 '21 at 15:11
When I try to read the xls file as you suggested, I get an error message: openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format. – ChristianP Feb 26 '21 at 07:49

pandas: remove all formula from xlsx/xls

1 Answers1