0

I am creating about 200 variables within a single iteration of a python loop (extracting fields from excel documents and pushing them to a SQL database) and I am trying to figure something out.

Let's say that a single iteration is a single Excel workbook that I am looping through in a directory. I am extracting around 200 fields from each workbook.

If one of these fields I extract (lets say field #56 out of 200) and it isn't in proper format (lets say the date was filled out wrong ie. 9/31/2015 which isnt a real date) and it errors out with the operation I am performing.

I want the loop to skip that variable and proceed to creating variable #57. I don't want the loop to completely go to the next iteration or workbook, I just want it to ignore that error on that variable and continue with the rest of the variables for that single loop iteration.

How would I go about doing something like this?

In this sample code I would like to continue extracting "PolicyState" even if ExpirationDate has an error.

Some sample code:

import datetime as dt
import os as os
import xlrd as rd

files = os.listdir(path)

for file in files: #Loop through all files in path directory  
            filename = os.fsdecode(file) 
            if filename.startswith('~'): 
                continue

            elif filename.endswith( ('.xlsx', '.xlsm') ): 
                try:
                    book = rd.open_workbook(os.path.join(path,file)) 
                except KeyError:
                    print ("Error opening file for "+ file) 
                    continue

                    SoldModelInfo=book.sheet_by_name("SoldModelInfo")
                    AccountName=str(SoldModelInfo.cell(1,5).value)
                    ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
                    PolicyState=str(SoldModelInfo.cell(1,6).value)
                    print("Insert data of " + file +" was successful")
            else:
               continue               
Coldchain9
  • 1,373
  • 11
  • 31
  • Chunk under the `except` after `continue` isn't even reachable? – DirtyBit Feb 19 '19 at 14:49
  • 1
    Your indentation looks off. You field extractions should be *after* the `except KeyError`, not part of it. You also need to catch (and apparently ignore) errors for each assignment separately. – chepner Feb 19 '19 at 14:49
  • Sorry I have code of over 500 lines so it was a pain to provide a sample of it. I would need to do a try-except for every single variable? I guess that isnt a bad solution I just didnt know if that was what it would take. Also, the code under the except IS reachable. I have ran this code many times and it works. If it hits the except it goes to next loop, if not it goes to variable extraction. – Coldchain9 Feb 19 '19 at 14:52
  • Are your data all consisted of one row within the sheet? It's a bit confusing in your example as each iteration is per file not per row in the `xls`. Because my next question would be if they were in multiple rows, if one row has erroneous data is the entire row thrown out or just skip the particular error value. – r.ook Feb 19 '19 at 15:01
  • I think what @chepner is saying regarding your indentation is that the `SoldModelInfo` parts should be unindented one level, because once your code hits `continue` it would no longer execute the codes under `SoldModelInfo`. Possibly in your code it is indented correctly just not in your sample. – r.ook Feb 19 '19 at 15:02
  • @Idlehands No. The data is among 5 separate tabs and is located in all sorts of places. Its generally in the same place because each excel sheet starts as a template and people fill them out. The other fields within each sheet exist all over the place. This is not by my design I am just left with the task of getting the data out of these things. – Coldchain9 Feb 19 '19 at 15:12
  • If there is no data regularity to be found, a long list of try blocks is all you can do. Otherwise, write functions to encapsulate a particular form of try block and call repeatedly . For example `SomethingDate = my_date_decode( something, errors)` – nigel222 Feb 19 '19 at 15:17

3 Answers3

2

Use multiple try blocks. Wrap each decode operation that might go wrong in its own try block to catch the exception, do something, and carry on with the next one.

            try:
                book = rd.open_workbook(os.path.join(path,file)) 
            except KeyError:
                print ("Error opening file for "+ file) 
                continue

            errors = []

            SoldModelInfo=book.sheet_by_name("SoldModelInfo")
            AccountName=str(SoldModelInfo.cell(1,5).value)
            try:
                ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
            except WhateverError as e:
                # do something, maybe set a default date?
                ExpirationDate = default_date
                # and/or record that it went wrong?
                errors.append( [ "ExpirationDate", e ])
            PolicyState=str(SoldModelInfo.cell(1,6).value)
            ...
            # at the end
            if not errors:
                print("Insert data of " + file +" was successful")
            else:
                # things went wrong somewhere above. 
                # the contents of errors will let you work out what
nigel222
  • 7,582
  • 1
  • 14
  • 22
1

As suggested you could use multiple try blocks on each of your extract variable, or you could streamline it with your own custom function that handles the try for you:

from functools import reduce, partial

def try_funcs(cell, default, funcs):
    try:
        return reduce(lambda val, func: func(val), funcs, cell)
    except Exception as e:
        # do something with your Exception if necessary, like logging.
        return default

# Usage:

AccountName = try_funcs(SoldModelInfo.cell(1,5).value, "some default str value", str)
ExpirationDate = try_funcs(SoldModelInfo.cell(1,7).value), "some default date", [xldate_to_datetime, partial(dt.datetime.strftime, '%Y-%m-%d')])
PolicyState = try_funcs(SoldModelInfo.cell(1,6).value, "some default str value", str)

Here we use reduce to repeat multiple functions, and pass partial as a frozen function with arguments.

This can help your code look tidy without cluttering up with lots of try blocks. But the better, more explicit way is just handle the fields you anticipate might error out individually.

r.ook
  • 13,466
  • 2
  • 22
  • 39
  • Would this have any performance advantage compared to multiple try-excepts? – Coldchain9 Feb 19 '19 at 15:32
  • No. In essence it's doing the same thing, and if you are mostly passing in one function at a time you would need to update the `try_funcs` to not use reduce each time. This just helps tidy up your code without all the `try` blocks and help provide a default value in the same line. – r.ook Feb 19 '19 at 15:34
  • Is there a way to append the variable name that I am setting the try_funcs function to? Ie. could I append "ExpirationDate" automatically if it had an exception when the try_funcs function is called? – Coldchain9 Feb 19 '19 at 15:38
  • Not automatically without jumping through some hoops (I'd imagine `inspect` module can come in handy but I'm not sure if it'd work or is worth the trouble)... probably better to just update `try_funcs` to also include the variable name as a `string` if that matters to you, then just log the variable name as part of the `except` clause. `try_funcs('ExpirationDate', SoldModelInfo.cell(1,7).value), "some default", [some_funcs, ...])` – r.ook Feb 19 '19 at 15:44
  • Here's a relevant question regarding inspecting the variable name if you're interested: https://stackoverflow.com/questions/8875202/can-an-object-inspect-the-name-of-the-variable-its-been-assigned-to – r.ook Feb 19 '19 at 15:46
1

So, basically you need to wrap your xldate_to_datetime() call into try ... except

import datetime as dt

v = SoldModelInfo.cell(1,7).value

try:
    d = dt.datetime.strftime(xldate_to_datetime(v), '%Y-%m-%d')
except TypeError as e:
    print('Could not parse "{}": {}'.format(v, e)
alberand
  • 632
  • 5
  • 13
  • xldate_to_datetime is a proprietary function I have created. I am not using the one from xlrd. – Coldchain9 Feb 19 '19 at 16:11
  • @Coldchain9 I don't understand what you mean. What exception occurs when date is in wrong format? – alberand Feb 19 '19 at 16:16
  • I believe the error is coming from Excel because the user typed in '9/31/2015' so Excel doesnt recognize it as a date thus sends it in as a string instead of a float date which is the usual case when extracting an xldate. `TypeError: unsupported type for timedelta days component: str ` from function `def xldate_to_datetime(xldate): temp=dt.datetime(1899,12,30) delta=dt.timedelta(days=xldate) return temp+delta` – Coldchain9 Feb 19 '19 at 16:17
  • Ok, I updated the sample code. Instead of `ValueError` you need to catch `TypeError` – alberand Feb 19 '19 at 16:27