How can I append dataframes in a for loop that were generated from try-except

Question

I have code that takes in a list of queries and extracts tables. Then it appends those tables into one dataframe. However, if an error is reached I want to create an empty tableand append it the successful dataframes extracted from the try function so I log all the queries. Is this possible with try-except?

for i in list_of_querys:

    try:    
        create_dataframes
        
        append_dataframes
        
    except Exception as e:
        
        create_empty_dataframe
        
    finally:
        
        append_dataframe_from_try_&_except

Can you provide a concise example of what you're doing? anything like ```df = {'test': [1,2,3]}``` and ```my_list = [5,6,7]``` adding something like it will help to solve your specific needs.. — renan, Sep 01 '22 at 00:36
are the queries in the list of queries SQL, or some kind of list of results like python dictionaries that can be used directly? It would help to provide an example of what's in `list_of_querys` — Joe Carboni, Sep 01 '22 at 00:47
Why would you not just keep a list of dataframes then at the end use `pd.concat` to concatenate them all into one? Perhaps you can clarify the question to answer that. — David Parks, Sep 01 '22 at 00:48
What does "the successful dataframes" mean? You say there is a list of queries; is the idea that each one will produce a dataframe (unless there is an error)? And you want the list to contain the result from each query that didn't have an error? And you have code that will `try` to run each query and add the resulting dataframe to the list? ... Did you try putting the `try`/`except` *inside* the loop? — Karl Knechtel, Sep 01 '22 at 03:47
The data frames are being scraped from our database on our internal website. I believe it is build on SQL but only our IT has access to the SQL side. I am using Selenium to do bulk queries for specific ids. Once I get to the location of the table my scripts runs `table = pd.read_html(str(tableelement))[0]`. Some ids don't have tables or the entry is entirely empty in that when the id is searched a error 500 is returned. If that happens I want to log that id was entered into the query by appending it as an empty data frame with just the id in the id column. — Noah_Seagull, Sep 01 '22 at 15:41
Why do you need to save the id into the dataframe if it failed, rather than just saving a separate list of the failed queries (as I mentioned in my answer - but I now have the ability to comment!)? Otherwise, do you know the columns in advance? If you do, you could create a generic dataframe with all values as `nan` for all columns, then copy it, then if a query fills, copy the generic dataframe and fill in the id column. — Adam Oppenheimer, Sep 03 '22 at 06:45

score 0 · Answer 1 · answered Sep 08 '22 at 17:13

0

My was solved after I nested the code properly. Originally, I had code that was being executed after the try-except block so I was getting an error for the ids that had missing data. Essentially, I needed some rearranging.

answered Sep 08 '22 at 17:13

Noah_Seagull

337
5
18

Adam Oppenheimer · Answer 2 · 2022-09-01T01:06:31.530

Sorry that this should probably be a comment, but I can't comment since my reputation is too low.

Since appending an empty dataframe seems like it might be a bit problematic (e.g. how long would you make it, and what would it contain? If it's just nans, how would that give any information about which query failed?), I will propose an alternative idea: if the query succeeds, add a column indicating the query's index from the list of queries:

import pandas as pd
df = pd.DataFrame()
for i, q in enumerate(list_of_querys):

    try:    
        df_i = create_dataframes(q)

        df_i['query'] = i

        df = pd.concat([df, df_i])
        
    except Exception as e:
        
        pass

Then, you can run q_success = df['query'].unique() to get the indices of the queries that succeeded, and q_failed = set(np.arange(len(list_of_querys))).difference(set(q_success)) to get the indices of the queries that failed.

Then you can access the queries directly via list_of_querys[index].

Alternatively, you can just create lists of the queries that succeeded/failed:

import pandas as pd
df = pd.DataFrame()
query_successes = []
query_failures = []

for i, q in enumerate(list_of_querys):

    try:    
        df_i = create_dataframes(q)

        df = pd.concat([df, df_i])

        query_successes.append(i)
        
    except Exception as e:
        
        query_failures.append(i)

As a final note, in general you shouldn't just use except, you should except specific exceptions (see here).

Please read [answer] and note well that this is **not a discussion forum**. Partial solutions are fine, but state up front your understanding of the problem and what part you are trying to solve. Don't try to talk through it or get more information via the answer section. — Karl Knechtel, Sep 01 '22 at 03:48

How can I append dataframes in a for loop that were generated from try-except

2 Answers2