0

Im trying to dockerize a python aplication that runs on a conda environment and querys a postgresql database using sqlalchemy. Ive managed to run a dockerize my conda env and run jupyter on it, ive also managed to run a postgres image separately and ive built a docker-compose.yml with both services, but i cant get them to work together.

Im running the following code on a jupyter notebook with the conda env installed and im getting (psycopg2.OperationalError) could not connect to server: Connection refused.

import sqlalchemy
engine = sqlalchemy.create_engine('postgres://admin:admin@localhost:5432/postgresdb')
engine.connect()

Im quite new to docker, so im sure im getting some configuration wrong, any help would be great.

Configuration

  • Docker compose:
version: "3"

services:
  database:
    container_name: postgres
    image: 'bitnami/postgresql:12'
    environment:
      - POSTGRES_USER=admin
      - POSTGRES_PASSWORD=admin
      - POSTGRES_HOST=localhost
      - POSTGRES_PORT=5432
      - POSTGRES_DB=postgresdb
    ports:
      - "5432:5432"
    volumes:
      - db_volume:/var/lib/postgresql

  jupyter-conda:
    build: .
    ports:
      - "8888:8888"
    depends_on:
      - database

volumes:
  db_volume:
  • Dockerfile:
FROM ubuntu

SHELL [ "/bin/bash", "--login", "-c" ]

# Create a non-root user
ENV USER arquimedes
ENV UID 1000
ENV GID 1000
ENV HOME /home/$USER
RUN adduser --disabled-password \
    --gecos "Non-root user" \
    --uid $UID \
    --home $HOME \
    $USER

RUN apt-get update
RUN apt-get install wget -y

COPY environment.yml requirements.txt /tmp/
RUN chown $UID:$GID /tmp/environment.yml /tmp/requirements.txt

COPY postBuild /usr/local/bin/postBuild.sh
RUN chown $UID:$GID /usr/local/bin/postBuild.sh && \
    chmod u+x /usr/local/bin/postBuild.sh

COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chown $UID:$GID /usr/local/bin/entrypoint.sh && \
    chmod u+x /usr/local/bin/entrypoint.sh

USER $USER
# install miniconda
ENV MINICONDA_VERSION latest
ENV CONDA_DIR $HOME/miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-$MINICONDA_VERSION-Linux-x86_64.sh -O ~/miniconda.sh && \
    chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p $CONDA_DIR && \
    rm ~/miniconda.sh
# make non-activate conda commands available
ENV PATH=$CONDA_DIR/bin:$PATH
# make conda activate command available from /bin/bash --login shells
RUN echo ". $CONDA_DIR/etc/profile.d/conda.sh" >> ~/.profile
# make conda activate command available from /bin/bash --interative shells
RUN conda init bash

# create a project directory inside user home
ENV PROJECT_DIR $HOME/conda-oracle
RUN mkdir $PROJECT_DIR
WORKDIR $PROJECT_DIR

# build the conda environment
ENV ENV_PREFIX $PROJECT_DIR/env
RUN conda update --name base --channel defaults conda && \
    conda env create --prefix $ENV_PREFIX --file /tmp/environment.yml
# run the postBuild script to install any JupyterLab extensions
RUN conda activate $ENV_PREFIX && \
    /usr/local/bin/postBuild.sh && \
    conda deactivate

ENTRYPOINT [ "/usr/local/bin/entrypoint.sh" ]

# expose port the api
EXPOSE 8888

# password for jupyter
ENV JUPYTER_TOKEN eureka

# copy source files 
COPY --chown=$UID:$GID /src $PROJECT_DIR/src

# inicializar el container con jupyter
CMD [ "jupyter", "lab", "--no-browser", "--ip", "0.0.0.0" ]

docker-container up output

database-output

EDIT: Part 2

So, ive been able to connect to the database now thanks to jabbsons answer, but im now having trouble interacting with the db. Im trying to pusblish a database from pandas and then query it, im able to pusblish but get error querying, anybody has any idea what im doing wrong? Thank you so much for your time! Code bellow.

import pandas as pd
from datetime import date
from typing import Dict, Optional, Any
from sqlalchemy import create_engine
from sqlalchemy.types import CHAR, INTEGER, DATE

CREDENTIALS = 'postgresql://admin:admin@database:5432/postgresdb' 

def create_table_from_pandas(df: pd.DataFrame, table_name: str, schema: Dict[str, Any], chunksize: int = None) -> None:
    conection = create_engine(CREDENTIALS)

    # if_exists='replace' borra primero todo lo previo que haya en la tabla temporal
    df.to_sql(
        name=table_name,
        con=conection,
        if_exists="replace",
        index=False,
        dtype=schema,
        chunksize=chunksize,
    )

def execute_query(query: str, params: Optional[Dict[str, str]] = None) -> pd.DataFrame:
    conection = create_engine(CREDENTIALS)

    df_result = pd.read_sql(query, params=params, con=conection)

    return df_result

df_test = pd.DataFrame({'COL1':[1,2,3,4,5], 'COL2':['a','b','c','d','e'], 'COL3':[date.today()]*5})

df_scheema = {
    "COL1": INTEGER(),
    "COL2": CHAR(20),
    "COL3": DATE(),
}

create_table_from_pandas(df_test, 'DFTEST', df_scheema)

Everything fine untill this point, then I run:

execute_query("SELECT * FROM DFTEST")

And get: UndefinedTable: relation "dftest" does not exist

If I run

execute_query("SELECT * FROM information_schema.tables WHERE table_name = 'DFTEST'")

I can see the table screenchot pandas table

Again, thanks so much for your time, Im still pretty new to all this.

1 Answers1

0

The issue you are connecting to the localhost from your jupyter notebook. If you want to connect to another service, use its name:

postgres://admin:admin@database:5432/postgresdb

For the second question, run

execute_query("SELECT * FROM \"DFTEST\"")

or

execute_query('SELECT * FROM "DFTEST"')

more details here

jabbson
  • 4,390
  • 1
  • 13
  • 23
  • So, been able to connect now thanks to your comment, but still having trouble creating and querying tables in this settup. Added an edit to my original post. Any ideas what im doing wrong now? Sorry for beeing a pain hehe – Guzmán Vitar Nov 14 '21 at 16:18
  • You are welcome, glad it helped. It is best to have one issue per post, this allows for easier search for people with similar issues later on. Another suggestion I have is to have a minimal reproducible example, so instead of posting a lot of code that only you are familiar with, try to trim it down to a very basic example of what you are trying to figure out, people tend to care much more for issues they don't have to spend half an hour trying to figure out. I understand, It is more time consuming for you, but this is my advice. I will try to take a look later if I have time. – jabbson Nov 14 '21 at 17:08
  • @GuzmánVitar updated the answer. – jabbson Nov 14 '21 at 17:36
  • Thank you good sir, I will certainly take into consideration your advice for the next time. Actually, when I searched for the second problem alone I found the answer, so definetively solid advice. Again, thank you so much for your time. – Guzmán Vitar Nov 14 '21 at 18:19