0

I want to mount an AWS s3 bucket on my Docker container which I am using to run some AWS Batch jobs. I have been researching several ways of going about this but I still lack clarity as to how I can get this to work on AWS Batch which is going to dynamically allocate EC2 instances based on the job definitions. The following are the ideas I have gathered , but I am unsure of how to put them together:

  1. https://rexray.readthedocs.io/en/v0.9.0/user-guide/docker-plugins/ I could use this plugin to mount S3 bucket as Docker volume , but unsure how to do this on AWS Batch. Should this plugin be a part of the Docker image ?
  2. I could use s3fs-fuse but I was told that I wont be able to install or store any of the files from S3 on EC2 instances on AWS Batch instances, which can then be mounted in docker. - is there a way to do this by including some code in the AMI that will copy files from s3 to instance?
  3. Are there any other ways I can get this to work?

Pardon me if my questions are too basic. I am fairly new to Docker and AWS Batch. Would appreciate any help!

Thanks !

rkm19
  • 149
  • 1
  • 7
  • 1
    Just as a piece of advice, you should avoid using S3 as a mount. The [EFS](https://aws.amazon.com/efs/) service is designed to be used as an NFS mount. – Chris Williams Nov 09 '20 at 08:57
  • Thank you for the suggestion. I am using a workflow language to run my jobs, and it may not be compatible with EFS. – rkm19 Nov 09 '20 at 09:07
  • You can mount S3 Buckets on EC2 instances via AWS Storage Gateway. Don't really know how that integrates with AWS Batch though... – Robert Kossendey Nov 09 '20 at 09:18
  • `it may not be compatible with EFS` EFS is at least a real FS and behaves that way, S3 has eventual/read-after-write consistency so really consider if it's feasible for you (I'd avoid using S3 as a FS), Indeed S3 is cheaper storage, but it's not any filesystem, Mounting S3 it is always a kind of *workaround" with some caveats. Anyway, the s3fs-fuse [should work](https://stackoverflow.com/questions/24966347/is-s3fs-not-able-to-mount-inside-docker-container) from inside the container – gusto2 Nov 09 '20 at 10:21

2 Answers2

1

I have personally used s3fs to solve this problem in the past. Using S3 as a mounted filesystem has some caveats which you would be wise to familiarize yourself with (because you are treating something that is not a filesystem like it is a filesystem, a classic leaky abstraction problem), but if your workflow is relatively simple and does not have the possibility for race conditions you should be able to do it with some confidence (especially now that as of Dec 2020 AWS S3 has released read-after-write consistency automatically for all applications).

To answer your other question:

I could use s3fs-fuse but I was told that I wont be able to install or store any of the files from S3 on EC2 instances on AWS Batch instances, which can then be mounted in docker. - is there a way to do this by including some code in the AMI that will copy files from s3 to instance?

If you use s3fs to mount your S3 bucket as a filesystem within docker, you don't need to worry about copying files from S3 to the instance, indeed the whole point of using s3fs is that you can access all your files in S3 from the container without having to move then off of S3.

Say for instance you mount your S3 bucket s3://my-test-bucket to /data in the container. You can then run your program like my-executable --input /data/my-s3-file --output /data/my-s3-output as if the input file was right there on the local filesystem. When its done you can see the output file will be on S3 in s3://my-test-bucket/my-s3-output. This can simply your workflow / cut down on glue code quite a bit.

My dockerfile for my s3fs AWS batch container looks like this:

FROM ubuntu:18.04

RUN apt-get -y update && apt-get -y install curl wget build-essential automake libcurl4-openssl-dev libxml2-dev pkg-config libssl-dev libfuse-dev parallel

RUN wget https://github.com/s3fs-fuse/s3fs-fuse/archive/v1.86.tar.gz && \
    tar -xzvf v1.86.tar.gz && \
    cd s3fs-fuse-1.86 && \
    ./autogen.sh && \
    ./configure --prefix=/usr && \
    make && \
    make install && \
    rm -rf s3fs-fuse-1.86 v1.86.tar.gz

RUN mkdir /data

COPY entrypoint.sh /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

entrypoint.sh is a convenience for always running the s3fs mount before the main program (this breaks the paradigm of one process per docker container but, I don't think its cause for major concern here). It looks like this:

#!/bin/bash

bucket=my-bucket

s3fs ${bucket} /data -o ecs

echo "Mounted ${bucket} to /data"

exec "$@"

Note related answer here: https://stackoverflow.com/a/60556131/1583239

qwwqwwq
  • 6,999
  • 2
  • 26
  • 49
0

I am assuming you want to read/write to an S3 bucket. You can do this within your containerized code by using a library like boto3. You will also need to provide IAM permissions for the container to access S3.

Vince
  • 593
  • 1
  • 5
  • 10