1

I'm working in a jupyter notebook. I have a large amount of data that I have to initially load and then work with. I don't want to have to reload it every time I shutdown and start my laptop or the notebook. I'm wondering when I save and checkpoint the notebook each time does it save the data that has been loaded and all the work I've done? So if I closed the notebook and re-openned it later I could just start working where I'd left off? Or do I need to use something like pickle? If so could someone please provide an example of how I could use pickle or something similar to save my data and work and reloaded it?

In r I would just save an rdata file and load the file later. I'm a little new to python.

Update:

code:

print(df_business[1:3])

Sample Data:

               address                                         attributes  \
1       2824 Milton Rd  {u'GoodForMeal': {u'dessert': False, u'latenig...   
2  337 Danforth Avenue  {u'BusinessParking': {u'garage': False, u'stre...   

              business_id                                         categories  \
1  mLwM-h2YhXl2NCgdS84_Bw  [Food, Soul Food, Convenience Stores, Restaura...   
2  v2WhjAB3PIBA8J8VxG3wEg                               [Food, Coffee & Tea]   

        city                                              hours  is_open  \
1  Charlotte  {u'Monday': u'10:00-22:00', u'Tuesday': u'10:0...        0   
2    Toronto  {u'Monday': u'10:00-19:00', u'Tuesday': u'10:0...        0   

    latitude  longitude                                name neighborhood  \
1  35.236870 -80.741976  South Florida Style Chicken & Ribs     Eastland   
2  43.677126 -79.353285                    The Tea Emporium    Riverdale   

  postal_code  review_count  stars state  
1       28215             4    4.5    NC  
2     M4K 1N7             7    4.5    ON  

Update2:

Code:

import pickle

your_data = df_business

# Store data (serialize)
with open('filename.pickle', 'wb') as handle:
    pickle.dump(your_data, handle, protocol=pickle.HIGHEST_PROTOCOL)

# Load data (deserialize)
with open('filename.pickle', 'rb') as handle:
    unserialized_data = pickle.load(handle)
user3476463
  • 3,967
  • 22
  • 57
  • 117

1 Answers1

0

For me, as long as I do not end the kernel that I'm running for that notebook, I can resume it any any point later on. If you are going to need to restart your computer (and hence terminate the kernel you are using) then you will need to either re-run your notebook cells or load precomputed data using pickle.

Information on using pickle can be found in this answer.

jonathanking
  • 642
  • 2
  • 6
  • 12
  • 1
    Thank you for getting back to me and the link. So is the idea that I would have to save out each object using pickle and then reload them? For example I've updated the original post with a sample of a large dataframe I read in from json data. If I didn't want to have to wait every time I re-started my laptop to re-load and parse the json data, would I just save the dataframe as a pickle object and then re-load it? Could you please provide an example of the code? – user3476463 Oct 26 '17 at 16:45
  • You are correct. An example would look nearly identical to the answer I linked to. The functionality you need is the simplest use case of pickle and I’m sure many thorough examples of it exist already. Pickle.dump() your object into a pickle file, then pickle.load() your file into an object. – jonathanking Oct 26 '17 at 16:55
  • Thank you again for the tip. I've added another update to the original post with what I think the code to save and load the pickle file should look like. Is it the correct idea? – user3476463 Oct 26 '17 at 18:01
  • That looks great! If you expect to shut down your computer, then it might be handy to pickle objects when you are done modifying them. Upon restarting your computer, jump ahead in your jupyter notebook to a point where you load them. – jonathanking Oct 26 '17 at 19:04
  • If I re-run the store step from the above code will it save over the previous version of the saved pickle file? – user3476463 Oct 27 '17 at 16:32
  • It certainly will. I'm not sure what your exact workflow is, but if this seems like a lot of work, you could include all of your saving and loading operations in a single function that you define. Then when closing and opening your notebook, you just call those functions. Personally, I don't have a problem either keeping the notebook open or rerunning my analyses but I could definitely understand why other people might need a little more continuity than myself. – jonathanking Oct 28 '17 at 14:36