101

I am trying to distribute a set of connected applications running in several linked containers that includes a mongo database that is required to:

  • be distributed containing some seed data;
  • allow users to add additional data.

Ideally the data will also be persisted in a linked data volume container.

I can get the data into the mongo container using a mongo base instance that doesn't mount any volumes (dockerhub image: psychemedia/mongo_nomount - this is essentially the base mongo Dockerfile without the VOLUME /data/db statement) and a Dockerfile config along the lines of:

ADD . /files
WORKDIR /files
RUN mkdir -p /data/db && mongod --fork --logpath=/tmp/mongodb.log && sleep 20 && \
mongoimport  --db testdb --collection testcoll  --type csv --headerline --file ./testdata.csv  #&& mongod --shutdown

where ./testdata.csv is in the same directory (./mongo-with-data) as the Dockerfile.

My docker-compose config file includes the following:

mongo:
  #image: mongo
  build: ./mongo-with-data
  ports:
    - "27017:27017"
  #Ideally we should be able to mount this against a host directory
  #volumes:
  #  - ./db/mongo/:/data/db
  #volumes_from:
  #  - devmongodata

#devmongodata:
#    command: echo created
#    image: busybox
#    volumes: 
#       - /data/db

Whenever I try to mount a VOLUME it seems as if the original seeded data - which is stored in /data/db - is deleted. I guess that when a volume is mounted to /data/db it replaces whatever is there currently.

That said, the docker userguide suggests that: Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization? So I expected the data to persist if I placed the VOLUME command after the seeding RUN command?

So what am I doing wrong?

The long view is that I want to automate the build of several linked containers, and then distribute a Vagrantfile/docker-compose YAML file that will fire up a set of linked apps, that includes a pre-seeded mongo database with a (partially pre-populated) persistent data container.

psychemedia
  • 5,690
  • 7
  • 52
  • 84
  • I guess what I want to do in the build phase is mount the db container onto a new data volume container so that the data in the db container directory is placed into the data volume, rather than mount the data volume container onto the db container, which overwrites the data I just imported. – psychemedia Jul 03 '15 at 17:08
  • Having established a data volume container with initially seeded data, I can destroy the original database container and then just connect a simple mongodb container to the data volume container for end user use. The heart of the original question is now this: what's the easiest way to build and populate a data volume container that a mongod container could connect to? – psychemedia Jul 03 '15 at 17:54
  • With `Rails` i use `docker-compose run container_name rake db:seed` – albttx Jun 10 '16 at 07:56
  • If using docker-compose is not a requirement, you could create a derived mongo image that configures the db, including seed data, on initialization. This [solution](http://stackoverflow.com/a/43567596/1089228) works well for me. – Steve Tarver Apr 23 '17 at 05:19

7 Answers7

137

I do this using another docker container whose only purpose is to seed mongo, then exit. I suspect this is the same idea as ebaxt's, but when I was looking for an answer to this, I just wanted to see a quick-and-dirty, yet straightforward, example. So here is mine:

docker-compose.yml

mongodb:
  image: mongo
  ports:
    - "27017:27017"

mongo-seed:
  build: ./mongo-seed
  depends_on:
    - mongodb

# my webserver which uses mongo (not shown in example)
webserver:
  build: ./webserver
  ports:
    - "80:80"
  depends_on:
    - mongodb

mongo-seed/Dockerfile

FROM mongo

COPY init.json /init.json
CMD mongoimport --host mongodb --db reach-engine --collection MyDummyCollection --type json --file /init.json --jsonArray

mongo-seed/init.json

[
  {
    "name": "Joe Smith",
    "email": "jsmith@gmail.com",
    "age": 40,
    "admin": false
  },
  {
    "name": "Jen Ford",
    "email": "jford@gmail.com",
    "age": 45,
    "admin": true
  }
]
Jeff Fairley
  • 8,071
  • 7
  • 46
  • 55
  • 1
    What are the pros & cons of using an external docker to seed? – Augustin Riedinger May 26 '16 at 11:12
  • 5
    I prefer to keep things separate and simple, and I've found that usually lends me the most flexibility... For instance... if I want to change something in my seed file, I'll need to build it again. If my seed were the same image as my running mongo instance, I'd lose my mongo data due to the reset. Obviously, I could export and import, but that's more work. – Jeff Fairley May 26 '16 at 16:21
  • Running the above form using `mongorestore` seems to give an `exited with code 0` error? – psychemedia Apr 04 '17 at 13:20
  • 2
    Did you happen to run into issues where you are seeding your DB multiple times on different runs ? – Vasif Apr 25 '17 at 22:07
  • 1
    @Vasif, actually I have seen that. I never did figure out why it happens, though. My seed file even creates a unique constraint that should be violated by data being imported a second time, yet the data still goes in... It's a head-scratcher. – Jeff Fairley Apr 25 '17 at 22:34
  • I have tried your solution changing the command to `mongorestore --host=127.0.0.1 -d rigel_db /dump/rigel_db/` and it yields `Failed: error connecting to db server: no reachable servers`. Do you know if there's something different with mongorestore? – Camilo Sampedro Jun 02 '17 at 17:22
  • 11
    Camilo: The issue may be that the database in the mongodb container hasn't fully started up when mongo-seed is started. I would put a "depends_on: mongodb" in the mongo-seed config. However, that will only wait for the mongodb container to start up, not the actual database inside it. A "restart: on-failure" command on the mongo-seed config will make it try again if the database isn't available. I've seen it try three or four times before the database is available. – k7n4n5t3w4rt Aug 29 '17 at 23:16
  • _If my seed were the same image as my running mongo instance, I'd lose my mongo data due to the reset_ That implies you are using docker the wrong way, when you are not mounting your data folders to volume and let it write in the same instance container as mongodb application itself – Tseng Oct 22 '18 at 08:17
  • What to do if i have multiple collections in my seedfile? – Kristoffer Tølbøll Aug 29 '21 at 17:58
  • I'm always getting: error connecting to host: could not connect to server: connection() error occured during connection handshake: auth error: sasl conversation error: unable to authenticate using mechanism "SCRAM-SHA-1": (AuthenticationFailed) Authentication failed. Even if I'm running the command manually inside the mongodb container !! – Ebram Shehata Apr 15 '22 at 13:06
  • I've updated the code example to use "depends_on" instead of the long deprecated "links" attribute. – Jeff Fairley Apr 25 '22 at 18:36
22

I have found useful to use Docker Custom Images and using volumes, instead of creating another container for seeding.

File Structure

.
├── docker-compose.yml
├── mongo
│   ├── data
│   ├── Dockerfile
│   └── init-db.d
│       └── seed.js

Every File location mentioned in Dockerfile/docker-compose.yml, is relative to location of docker-compose.yml

DOCKERFILE

FROM mongo:3.6

COPY ./init-db.d/seed.js /docker-entrypoint-initdb.d

docker-compose.yml

version: '3'

services:
  db:
    build: ./mongo
    restart: always
    volumes:
      - ./mongo/data:/data/db #Helps to store MongoDB data in `./mongo/data`
    environment:
      MONGO_INITDB_ROOT_USERNAME: {{USERNAME}}
      MONGO_INITDB_ROOT_PASSWORD: {{PWD}}
      MONGO_INITDB_DATABASE: {{DBNAME}}

seed.js

// Since Seeding in Mongo is done in alphabetical order... It's is important to keep
// file names alphabetically ordered, if multiple files are to be run.

db.test.drop();
db.test.insertMany([
  {
    _id: 1,
    name: 'Tensor',
    age: 6
  },
  {
    _id: 2,
    name: 'Flow',
    age: 10
  }
])

docker-entrypoint-initdb.d can be used for creating different users and mongodb administration related stuffs, just create an alphabetical ordered named js-script to createUser etc...

For more details on how to customize MongoDB Docker service, read this

Also, it is good to keep your passwords and usernames secure from Public, DO NOT push credentials on public git, instead use Docker Secrets. Also read this Tutorial on Secrets

Do note, it is not necessary to go into docker-swarm mode to use secrets. Compose Files supports secrets as well. Check this

Secrets can also be used in MongoDB Docker Services

phoenisx
  • 1,507
  • 17
  • 28
  • Could you clarify where is you take `db` in `db.test.drop();`? – while1pass Mar 08 '19 at 11:23
  • 1
    @while1pass Please check https://docs.mongodb.com/manual/tutorial/write-scripts-for-the-mongo-shell/. You can create your own `db` connection, or use the default connection provided while script runs (which I guess is the root user for init script here). – phoenisx Mar 10 '19 at 03:21
10

Current answer based on @Jeff Fairley answer and updated according to new Docker docs

docker-compose.yml

version: "3.5"

services:
  mongo:
    container_name: mongo_dev
    image: mongo:latest
    ports:
      - 27017:27017
    networks:
      - dev

  mongo_seed:
    container_name: mongo_seed
    build: .
    networks:
      - dev
    depends_on:
      - mongo

networks:
  dev:
    name: dev
    driver: bridge

Dockerfile

FROM mongo:latest
COPY elements.json /elements.json
CMD mongoimport --host mongo --db mendeleev --collection elements --drop --file /elements.json --jsonArray

You probably need to rebuild current images.

Shivang Patel
  • 157
  • 1
  • 3
  • 13
while1pass
  • 340
  • 5
  • 15
2

You can use this image that provides docker container for many jobs ( import, export , dump )

Look at the example using docker-compose

bwnyasse
  • 591
  • 1
  • 10
  • 15
1

You can use Mongo Seeding Docker image.

Why?

  • You have the Docker image ready to go
  • You are not tied to JSON files - JavaScript and TypeScript files are supported as well (including optional model validation with TypeScript)

Example usage with Docker Compose:

version: '3'
services:
  database:
    image: 'mongo:3.4.10'
    ports:
    - '27017:27017'
  api:
    build: ./api/
    command: npm run dev
    volumes: 
    - ./api/src/:/app/src/
    ports:
    - '3000:3000'
    - '9229:9229'
    links:
    - database
    depends_on:
    - database
    - data_import
    environment: 
    - &dbName DB_NAME=dbname
    - &dbPort DB_PORT=27017 
    - &dbHost DB_HOST=database
  data_import:
    image: 'pkosiec/mongo-seeding:3.0.0'
    environment:
    - DROP_DATABASE=true
    - REPLACE_ID=true
    - *dbName
    - *dbPort
    - *dbHost
    volumes:
    - ./data-import/dev/:/data-import/dev/
    working_dir: /data-import/dev/data/
    links:
    - database
    depends_on:
    - database

Disclaimer: I am the author of this library.

pkosiec
  • 19
  • 4
  • @psychemedia I've updated the example for latest version of Mongo Seeding. Happy seeding! – pkosiec Nov 02 '18 at 21:17
  • does this tool support adding indexes and other functionality or only documents? I checked out your repo and didn't see any mention in docs or examples of this. – PJH May 22 '20 at 21:55
  • @PJH Sorry, I missed your comment. Currently Mongo Seeding supports only documents, but feel free to suggest new features in GitHub issue :-) – pkosiec Oct 14 '20 at 19:00
1

Here is the working database seed mongodb docker compose use the below command to seed the database Dockerfile

FROM mongo:3.6.21

COPY init.json /init.json

CMD mongoimport --uri mongodb://mongodb:27017/testdb --collection users --type json --file /init.json --jsonArray

docker-compose.yml

 version: "3.7"
 services:  
    mongodb:
        container_name: mongodb
        image: mongo:3.6.21
        environment: 
          - MONGO_INITDB_DATABASE=testdb
        volumes:
          - ./data:/data/db
        ports:
          - "27017:27017"
    
      mongo_seed:
        build: ./db
        depends_on:
          - mongodb
0

To answer my own question:

  • simple YAML file to create simple mongo container linked to a data volume container, fired up by Vagrant docker compose.
  • in the Vagrantfile, code along the lines of:

config.vm.provision :shell, :inline => <<-SH docker exec -it -d vagrant_mongo_1 mongoimport --db a5 --collection roads --type csv --headerline --file /files/AADF-data-minor-roads.csv SH

to import the data.

Package the box.

Distribute the box.

For the user, a simple Vagrantfile to load the box and run a simple docker-compose YAML script to start the containers and mount the mongo db against the data volume container.

Daniel Serodio
  • 4,229
  • 5
  • 37
  • 33
psychemedia
  • 5,690
  • 7
  • 52
  • 84