0

I am running Docker on Mac OS. I have a container in which I run Jupyterlab. A notebook in Jupyterlab has to save some files. Small files are saved with no problem but large files ~3GB are a problem.

I have set Docker resources to 20GB RAM and 150GB disk space with 3GB swap. When the notebook runs it consumes about 8GB RAM. When I go to save the 3GB (xml) file (I know the size b/c I have run the notebook successfully outside of Docker) I can watch the memory consumption of the container steadily climb to just over 19GB at which point the kernel dies and a restart is required. While the file is created it has 0 B in it.

There is no error thrown (except the "kernel died" pop up) but just before the kernel dies the container logs show: Starting buffering for xxxxxxx.

Just on file and data size there seems to be more than enough memory to accommodate the data and the saving. What could be making the docker memory consumption increase? Is it buffering something in memory repeatedly?

To replicate:

  1. Create a docker container FROM --platform=linux/amd64 jupyter/minimal-notebook:python-3.10.11 and in it launch jupyter lab
  2. In the notebook import networkx (2.8.7) and create a graph with 1000000 nodes and 632278 edges
  3. Save the created graph with nx.write_graphml(graph, path) where path isn't important but can be same directory as notebook.
  4. Using Docker desktop or docker stats at the CLI monitor VM RAM usage.

In my case, on a Mac Pro M1, this process takes 11 min. and results in a ~500MB file size. However, container memory consumption increases by +4 GB during this operation. Looks like when the file is written, it is committed to memory first then dumped to file. If that is the case, then based on this post unless a way can be found to stream to file, say in batches, this issue is not resolvable.

MikeB2019x
  • 823
  • 8
  • 23

0 Answers0