2

Let's say I have created a dataframe in one jupyter notebook:

In notebook_1:

df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

Now what is the best (easiest, fastest, most reliable, ...) way of moving df into another notebook, so that I could work with it there?

In notebook_2:

df = ... #do something here to load the data from the df in notebook_1
#and now use the df further on

I came up with the following ways that (except possibly the last one) all work:

  • export to a file and then import using one of the pandas IO tools - probably a good solution for very large dataframes, but otherwise seems unnecessarily complicated
  • use pd.DataFrame.to_clipboard and pd.DataFrame.read_clipboard - looks good for small dataframes, but according to this answer it is not 100% reliable, plus it probably won't work for larger dataframes (?), plus if I rerun the notebook_2 after I replace the clipboard content with something else, it won't work anymore
  • use pd.DataFrame.to_json and pd.DataFrame.read_json with orient = 'table' and path_or_buf=None and copy-paste the output - works well for my case (relatively small dataframe) - the upside is that I can see the imported data directly in the notebook_2 in plain text and so that notebook becomes self-contained once I first copy the data
  • copy-paste the whole cell containing the output of df ??? - I couldn't test whether it copies also the data or just the code - the copy-paste doesn't work for me at all. But I doubt that.

Edit - Options mentioned in the comments or answers:

  • the IPython %store magic as mentioned by Chris - really simple and nice, only downside is that the notebook_2 will not be self-contained

But I would like to know if there are other possible methods and what are the advantages, disadvantages, caveats, ...

I am interested more about insights, comparisons etc. than just one way how to do it (unless there is one, clearly the best, way that works perfectly in all scenarios and has no disadvantages).

Thanks.

PGlivi
  • 996
  • 9
  • 12
  • The way I solved this was by pickling the df to disk, and in the other notebook I'd load it in the beginning. This allows me to start/stop jupyter (use case was in my own computer), and even have versioning, allowing me to use a previous version of the data. I chose pickle because the files aren't huge (max 15 MB), it was all in my computer, and I save other python objects too. Pickle was the solution for me, but it's neither the [most secure](https://docs.python.org/3/library/pickle.html), nor [performant solution](https://www.benfrederickson.com/dont-pickle-your-data/). – MkWTF Jan 15 '20 at 09:30
  • 1
    Does this answer your question? [Share data between IPython Notebooks](https://stackoverflow.com/questions/31621414/share-data-between-ipython-notebooks) – Chris Jan 15 '20 at 09:32
  • @Chris: The %store magic as mentioned in your link is an option (pretty neat one, at least for some cases) that I didn't know of. Thank you. However this doesn't answer the question completely. For instance it has the downside that notebook_2 will not be self-contained (will depend on the state of the IPython database) - this may be a problem in some cases. – PGlivi Jan 15 '20 at 13:24

1 Answers1

1

export your dataframe to the disk using

df.to_csv('your_name.csv')

now go to another notebook and read it using the

pd.read_csv('your_name.csv')

Bharath_Raja
  • 622
  • 8
  • 16
  • I mentioned this option already in my question. But it seems unnecessary to use a local file for this... – PGlivi Jan 15 '20 at 13:29