Python/Pandas: Trouble with Excel cells that appear blank but are not empty

Question

I am running the below code. Everything works except that after running the last line I get a "TypeError: unsupported operand type(s) for /: 'str' and 'int'" error.

The issue, it turns out, relates to a particular Excel cell issue that seems to be prevalent in more than one of the +80 files I am reading into Python. Sometimes some of the cells in the Excel files I am reading read appear to be blank in Excel but aren't empty (this can be checked in Excel via the ISBLANK() function). Reading and importing empty cells isn't an issue, but importing/reading empty cells that empty to be blank but aren't registered as so by Excel are a problem.

Opening the Excel file and manually selecting and deleting the cell (which appeared empty, but somehow register as non-blank in Excel) seems to do the trick. However, I don't want to open every Excel file and start selecting and deleting empty cells for reassurance. Note also that it wasn't that the Excel cell in question just contained a space (ie, ""), it didn't so the problem seems to be something else.

How can I deal with this from with Python/Pandas? I tried all of the solutions in this thread and nothing seemed to work.

import openpyxl 
import os
import pandas as pd
import time
from openpyxl import load_workbook

os.chdir('C://Files//Research')
directory = os.listdir('C://Files//Research') 

df = pd.DataFrame()

start = time.time()
for file in directory:      
    if os.path.isfile(file):

        file_name = file
        workbook = load_workbook(filename = file, data_only=True)

        sheet1 = workbook['3. ISO']
        c5 = sheet1['C5'].value 
        c6 = sheet1['C6'].value

        sheet11 = workbook['4. Survey Overview']
        c10 = sheet11['C10'].value 
        c11 = sheet11['C11'].value 

        df = df.append(pd.DataFrame({
                "File_name":file, "ISO":c5, "Output":c6,
                "Nat":c10, "Urb": c11 
                }, index=[0]))

end = time.time()
print(end - start)

df['Output %'] = 100* (df['Output'] / df['Nat'])

Without the actual data, it's really hard to tell, but try: `df['Output'].str.strip().astype(float)` and the same with `df['Nat']`. — elPastor, Mar 25 '20 at 21:01
Thanks. I tried this previously and I get a 'ValueError: could not convert string to float:' error. — StatsScared, Mar 25 '20 at 21:08
As I said, without providing the data, it's pretty much impossible to diagnose. — elPastor, Mar 25 '20 at 21:09
Maybe there's a way to completely trim all cells without a numeric or string? Not sure. Also, knowing at least what row has the error (each row corresponds to one file) would also be helpful. Worst case I go back to the files and fix them myself, but I would need to identify the problematic files first.. — StatsScared, Mar 25 '20 at 21:39

Python/Pandas: Trouble with Excel cells that appear blank but are not empty

0 Answers0