1

Running docker on the host command line, I can run a command in a container that downloads a set of files, and shares those files back to host via a shared volume:

docker run --rm --volume "${PWD}":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x

What I would like to do is to be able to run the same command from within a Jupyter container created using a docker-compose.yaml file of the form:

notebook:
  image: jupyter/notebook
  ports:
    - "8899:8888"
  volumes:
    - ./notebooks:/notebooks
    - /var/run/docker.sock:/var/run/docker.sock
  privileged: true 

In a Jupyter notebook code cell, I tried running:

#Make sure docker is available in the Jupyter container
!apt-get update
!apt-get install -y docker.io

!mkdir -p downloads
#Run a download command in another container and share the downloaded files back
! docker run --rm --volume "${PWD}/downloads":/contentmine --tty --interactive psychemedia/contentmine getpapers -q aardvark -o /contentmine/aardvark -x 

I can see the files are downloaded somewhere, but I don't know where? Are they downloaded into the docker VM context outside the Jupyter container? How can I mount a directory from my notebook container within the temporary container I'm using to run the file downloading command-line container?

As a part 2 to the question, I'd then also want to be able to use the files in downloads directory as an input to another command line command run in another container and again keep a copy of the results in the notebook container downloads directory:

docker run --rm --volume "${PWD}/downloads":/contentmine --tty --interactive psychemedia/contentmine norma --project /contentmine/aardvark -i fulltext.xml -o scholarly.html --transform nlm2html

Presumably, if there's a quick fix to the first part of the question, the same fix applies to this part?

psychemedia
  • 5,690
  • 7
  • 52
  • 84

2 Answers2

0

I think the answer you're looking for involves creating a named container and specifying it as the mount point for downloads/ and then mounting it at creation in any containers using it in later sessions.

Larry
  • 136
  • 2
0

To answer my own question, I think was making a mistake naming the linked data volume container.

This seems to work - from notebookdockercli/docker-compose.yml:

notebook:
  image: jupyter/notebook
  ports:
    - "8899:8888"
  volumes_from:
    - contentmineshare

  volumes:
    - ./notebooks:/notebooks
    - /var/run/docker.sock:/var/run/docker.sock
  privileged: true 

contentmineshare:
  image: psychemedia/contentmine 
  volumes:
    - /contentmine

Then in a notebook code cell I can run:

!apt-get update
!apt-get install -y docker.io

then run the docker CLI command:

! docker run --rm --volumes-from notebookdockercli_contentmineshare_1 --tty --interactive psychemedia/contentmine getpapers -q rhinocerous -o /contentmine/rhinocerous -x

I can then see the files:

!ls  /contentmine/rhinocerous/

The issue I had was using the wrong volumes-from name.. (I'm not sure how to pick up the name automatically?)

For creating a Docker IPython magic, it would probably be cleaner to use docker-py to create a data volume container for use by the notebook in synching files with a command line container.

The above route defined a named data volume container linked to the notebook container at startup by docker compose. It's more flexible to not have this requirement.

If we know the name of the notebook container we're in, and we know the mount point of a shared directory, we can find the path to the directory that can be mounted as a volume when calling the command line container

import docker
def getPath(container,mountdir):
    cli = Client(base_url='unix://var/run/docker.sock')
    if cli.containers(filters={'name':container}):
        return [x['Source'] for x in cli.inspect_container(container ['Mounts'] if 'Destination' in x and  x['Destination']==mountdir ]
    return []

pp=getPath('/notebookdockercli_notebook_1','/notebooks')
DD='{}{}'.format(pp[0],'/testN')
! docker run -v {DD}:/contentmineTest --tty --interactive psychemedia/contentmine getpapers -q rhinocerous -o /contentmineTest/rhinocerous -x

This mounts a specified directory in the notebook container against the output folder from the commandline container.

For some reason I couldn't get docker-py to work for this route? I'd expected to be able to just do this:

cli = docker.Client(base_url='unix://var/run/docker.sock')
container_id = cli.create_container(image='psychemedia/contentmine',
                                volumes='{}{}:{}'.format(pp[0],'/test6','/contentmineTest'),
                                command='getpapers -q rhinocerous -o /contentmineTest/rhinocerous -x')
cli.start(container_id)

But it didn't seem to mount in the notebook container?

Then it struck me that was an even quicker way, albeit at the risk of exposing all the notebook container contents to the command line container: link an appropriate volume in to the command line container from the notebook container:

! docker run --rm --volumes-from notebookdockercli_notebook_1 psychemedia/contentmine getpapers -q rhinocerous -o /notebooks/maybe/rhinocerous -x

In docker-py:

cli = docker.Client(base_url='unix://var/run/docker.sock')
container_id = cli.create_container('psychemedia/contentmine',
                                host_config=cli.create_host_config( volumes_from='notebookdockercli_notebook_1'),
                                command='getpapers -q rhinocerous -o /notebooks/testX/rhinocerous -x')
cli.start(container_id)

I'm not sure how to remove the container once run though, given it may take an arbitrary amount of time to run so how do we know when to remove it? start() doesn't seem to accept the docker run --rm switch? I suppose we could name the containers in a particular way and at the end do housekeeping and remove them all?

psychemedia
  • 5,690
  • 7
  • 52
  • 84
  • Another step - pick up the details of the jupyter container by running: `import os` and then: `cli.containers(filters={'id':os.environ['HOSTNAME']})[0]` – psychemedia May 10 '16 at 20:43