2

Right now I have this in a Dockerfile:

ENV NLTK_DATA /nltk_data
RUN python3 -m nltk.downloader -d /nltk_data all

this Python library has several gigs of data. I don't know need an extra 5 GBs in each Docker image I publish to Amazon ECR. Is there some way to push a shared image to ECR that containers can reference when they run?

I want to do something like:

docker pull ecr-url/shared-image:latest
docker run -v shared-image:/nltk_data:/nltk_data my-image

basically, I don't see why it would be necessary to run the shared-image as a container, I just need the data from it. However, according to this answer: https://stackoverflow.com/a/34093828/1223975

it says:

Unfortunately there doesn't seem to be a way to copy files directly from Docker images. You need to create a container first and then copy the file from the container.

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • Do you run your containers on Amazon ECS? You could check ECS Docker volumes page https://docs.aws.amazon.com/en_us/AmazonECS/latest/developerguide/docker-volumes.html – Vadim Ashikhman Jul 02 '19 at 23:56
  • 1
    In an Amazon context, I'd tend to keep that data in S3, and give the instance profile an IAM role with the ability to copy the data. Maybe I'd even copy it at bootup time using a userdata script. ECR _only_ holds Docker images, and not random other artifacts or data. – David Maze Jul 03 '19 at 00:00
  • your mount should be only `shared-image:/nltk_data` why you add `:/nltk_data` twice ? – LinPy Jul 03 '19 at 06:15

0 Answers0