4

I am new to DBT and currently trying to build a Docker container where I can directly run DBT commands within. I have a file where I export env variables (envs.sh) that looks like:

export DB_HOST="secret"
export DB_PWD="evenabiggersecret"

My packages.yml looks like:

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.2

I structured my docker file like:

FROM fishtownanalytics/dbt:0.19.0b1
# Define working directory
WORKDIR /usr/app/profile/
ENV DBT_DIR /usr/app
ENV DBT_PROFILES_DIR /usr/app
# Load ENV Vars
COPY ./dbt ${DBT_DIR}
# Load env variables and install packages
COPY envs.sh envs.sh
RUN . ./envs.sh \
 && dbt deps # Exporting envs to avoid profile not found errors when install deps

However, when I run dbt run inside the docker container I get the error: 'dbt_utils' is undefined. When I manually run dbt deps it seems to fix the issue and dbt run succeeds. Am I missing something when I am originally installing the dependencies?

Update: In other words, running dbt deps when building the Docker image seems to have no effect. So I have to run it manually (when I do docker run for example) before I can start doing my workflows. This issue does not happen when I use a Python image (not the image from fishtown-analytics)

alt-f4
  • 2,112
  • 17
  • 49

5 Answers5

3

Because the base image in the Dockerfile (fishtownanalytics/dbt:0.19.0b1) includes a VOLUME declaration for /usr/app, you can't modify anything in that directory during the build process (see Dockerfile reference notes on VOLUME). Because the working directory is using /usr/app, the modules that are being downloaded and installed by the RUN dbt deps command in the Dockerfile are being discarded rather than being added to the final image. The python image doesn't have the same VOLUME declaration so isn't causing the same issue.

To get around this you can change the working directory to something other than the declared volume name (e.g., /usr/dbt).

Sean F
  • 31
  • 1
2

Running dbt deps is a necessary step in preparing your dbt environment, so you should feel fine invoking dbt deps in the Dockerfile prior to dbt run.

I think, however, your intention is getting lost in the RUN instruction on the last line: either the last-line RUN command should be converted to a CMD instruction or you could perform a RUN dbt depts by itself prior. (See this question for more detail on the differences between RUN and CMD.)

And, for what it's worth: dbt Cloud, the hosted SaaS build environment for dbt, also runs dbt deps as one of its standard steps for all dbt build jobs -- meaning executing at run time, every time, similar to Docker's CMD.

Nick S.
  • 78
  • 4
  • Hi Nick = ) The issue I am having is that running `dbt deps` in the Dockerfile itself seems to have no effect. I need to manually run when I run the container. However when I use a Python image and install dbt it seems to work there – alt-f4 Dec 29 '20 at 09:47
  • I have also updated my question to better illustrate the problem – alt-f4 Dec 29 '20 at 09:53
  • @alt-f4 I updated the answer correspondingly - let me know if this helps! I think the state of the image after your last `RUN` is simply not being 'committed' so that it's as if `dbt deps` never ran... – Nick S. Dec 29 '20 at 11:48
1

@alt-f4

Fundamentals of what dbt deps does is to install a local copy of the packages into your project/dbt_modules/ directory.

By default that directory is included in the .gitignore so maybe try:

  1. remove dbt_modules from .gitignore
  2. install via dbt deps to repo
  3. commit that version of the modules into your repo?

Might work but I'd recommend version locking each package in the packages.yml if you go that route.

sgdata
  • 2,543
  • 1
  • 19
  • 44
1

There is a different approach to solving this volume issue affecting dbt_packages and dbt deps, that I just came across by tinkering around.

I have built my own custom Dockerfile, but faced similar issues when trying to mount a VOLUME on the same directory of the active dbt project. Safe to say, RUN dbt deps was not persisting into the running container.

I was able to come up with a different approach to this problem that solved my use case.

You can simply update where dbt deps installs its packages for your dbt project, using the packages-install-path.

In the case of the current Dockerfile which has a VOLUME at /usr/app, simple pointing packages-install-path in the dbt_project.yml to a directory outside /usr/app should be enough. I did it in my service accounts home directory, so /home/service-account-user/dbt-packages .

✅ TLDR SOLUTION ✅

  1. We know the VOLUME is pointing to the /usr/app directory
  2. To get around this... update the dbt_project.yml to have a packages-install-path:/anything/but/usr/app directory. Really any path not associated with the VOLUME.
  3. Now, when you use RUN dbt deps it will package the installs in the appropriate directory and persist across the Docker layers.

The benefit here is this is just a modification to your dbt_project.yml not the necessarily the actual Dockerfile .

Frankly, dbt labs should just update their image to place their packages outside of that directory to avoid this problem for everyone.

M.Brody
  • 11
  • 2
0

you need to add the relevant packages to packages.yml. I don't think it will be directly available in the Fishtown image. You might wanna add the required packages in a package.yml file locally and copy it to the dbt dir. After that dbt deps should be able to install those packages.

gurjarprateek
  • 409
  • 5
  • 13