Does dataframe remain when console is terminated

Question

I have created a Pandas dataframe:

scores = pd.DataFrame(
        {"batch_size" : list(range(64)),
         "learning_rate" : list(range(64)),
         "dropout_rate" : list(range(64)),
         "accuracies" : [[0]]*64,
         "loss" : [[0]]*64,
         "training_time" : list(range(64)),
         }, index = list(range(64)))

Then, in a loop I run 64 models and add the results in the list.

The loop is still ongoing and I dont expect it to be finished before my deadline. Therefore, I would like to terminate the console and continue with the information that has been stored in scores so far. However, I only want to do this if I can still access the dataframe after I terminate the loop.

Can I use the dataframe with intermediate results if I terminate the loop while it's still running?

How are you planning on terminating the loop? Are you saving the DF to a temp file or something while the loop is running, or is it just in memory? How do you plan on accessing the DF later? — MattDMo, Jul 07 '20 at 17:57
Its just in memory now. I would like to export it to csv afterwards — Emil, Jul 07 '20 at 18:01
You should have a look to `pandas.to_csv` function : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html — pyOliv, Jul 07 '20 at 20:24
Store a pointer to what models are executed (1,2,3..) and save the results by one, assemble dataframe after you have all the results? — Evgeny, Jul 07 '20 at 21:37

David Erickson · Accepted Answer · 2020-07-07T21:32:44.407

If possible, I would prioritize pandas methods rather than using for loops, as this would solve the core problem. Even better, if you are able to change the for loops to pandas methods, and you want even faster execution, then many pandas methods can also be used by a big data python library called dask. That is a little bit more advanced, but I was in a similar position for a large project and dask was a great solution, but it took a day or so to get used to the library and transform my code from pandas to dask.

If you just want to keep your code as is and do this in pandas, then I would look into separating the dataframe into chunks if it is still taking forever to process:

n = 100000
scores_df_list = [scores[i:i+n] for i in range(0,scores.shape[0],n)]
i=0
for df in scores_df_list:
    i+=1
    #inefficient for loop code on large dataset...
    #inefficient for loop code on large dataset continued...
    df.to_csv(f'file{i}.csv')

See more here from the answer by @ScottBoston and kindly upvote his solution if helpful: Pandas - Slice Large Dataframe in Chunks:

This is not exaclty an answer to my question, but thanks for the suggestion as I was not aware of the `dask` library. Also, I was doing a grid search and stored the intermediate results in my dataframe. So I think a for loop was the best approach here. — Emil, Jul 09 '20 at 12:11

Does dataframe remain when console is terminated

1 Answers1