2

I'm developing a Jupyter Notebook for my team to use to catalogue and analyse some proprietary data. I'm ready to share it with the team for on-going execution and development. The team generally have Windows 10 workstations and are skilled engineers, though not data scientists. No one currently uses Jupyter.

I now realise I might have thoroughly misjudged Jupyter's ability to support this sort of working environment.

Option 1: Individual installations

This is the worst case scenario. Anyone that wants to run or modify the notebook needs to install Jupyter. Anaconda is probably the best way to go, but its a big, ugly, scary install. Worse, every user will have to install and manage additional libraries. Any notebook change that requires a kernel change will have to be manually applied to each installation.

Surely, being client-server, this is not the intention of Jupyter.

Option 2: One server, many clients

The obvious alternative is to host the Jupyter server on a network accessible computer and have all users connect to it with a browser. That way there's only one shared installation to manage and each user just needs a URL to access it.

But there's a gotcha - the server expects the notebook to be on its own file system! So every user will access the same notebook file. This makes version control very problematic - no one can check out their own copy of the notebook for independent edit and commit sessions. Instead, changes will overwrite the only copy, and commits/reverts/diffs will have to be done on the server (or by mounting the server's file system).

Option 3: Server in Docker image, each user runs a container

Docker to the rescue? That way we can build/maintain one server image (and even version control it) and each user only needs to have a Docker engine installed to instantiate the image (which is a friendly 8GB download!!). They connect to their own container which, with a bit of scripting trickery, will be pointing at their own copy of the notebook.

This option only took 20 hours to investigate before discovering that it fundamentally sucks. Working with the kernel is tricky with lots of new skills necessary. But more showstopping - nothing that shows a Qt window will work. The qtconsole we can do without, but part of our notebook shows a File Open dialog and the best way to do that is with a Qt Widget. With the server in a Docker Container expecting an X Windows environment, and the client in a Windows browser, the Widget cannot be shown.

The Qt issue was the last of many, many issues trying to get the Docker option running. Everything from matplotlib to path mapping, from os library calls to ipywidgets needed to be investigated, tweaked, Googled, chopped and changed to work. I'm fairly convinced that these dramas would be on-going.

Conclusion

There are lots of discussions around Jupyter version control. There's lots of options for read-only sharing. And there's even a project for runtime-building a Docker container to provide executable access to a notebook. But there is scant advice on using Jupyter in a team environment.

Given the endless complications when the server is not natively running on the same machine as the client, I'm starting to believe Option 1 is the only sane way to go. Before I go to my colleagues with the crappy news, are there any other suggestions?

Community
  • 1
  • 1
Heath Raftery
  • 3,643
  • 17
  • 34
  • If you're OK with using a commercial, hosted offering, have a look at datascience.ibm.com. We had to implement a lot of customizations to make the Jupyter notebooks work well in a team environment. I don't think you'll find an easy way to roll your own solution. – Roland Weber Jun 01 '17 at 11:42
  • In our situation we're not okay with that, but great to be aware of it. Good to know it's not something I should be expecting out of the box. – Heath Raftery Jun 02 '17 at 23:21

1 Answers1

1

Ended up having a fruitful discussion on the Jupyter Google Group and have confirmed that out-of-the-box, Jupyter does not support this sort of working environment. Indeed, crucially, Jupyter expects the server to have a single user.

The most promising DIY solution was firstly to deploy JupyterHub, for two reasons:

  1. It launches a new server instance for each user, preventing any multi-user per server issues.
  2. It prompts for users to identify themselves, allowing different actions to be taken depending on the user.

And secondly, have the server mount each user's file system (or equivalent network architecture), so it can point the user back to their own local files.

I have not implemented this strategy (making do with Option 1 for now!) but it certainly makes sense.

Heath Raftery
  • 3,643
  • 17
  • 34