0

Summarize the problem:

The Python package basically opens PDFs in batch folder, reads the first page of each PDF, matches keywords, and dumps compatible PDFs in source folder for OCR scripts to kick in. The first script to take all PDFs are MainBankClass.py. I am trying to use a docker-compose file to include all these python scripts under the same network and volume so that each OCR script starts to scan bank statements when the pre-processing is done. This link is the closest so far to accomplish the goal but it seems that I missed some parts of it. The process to call different OCR scripts is achieved by runpy.run_path(path_name='ChaseOCR.py'), thus these scripts are in the same directory of __init__.py. Here is the filesystem structure:

BankStatements
 ┣ BankofAmericaOCR
 ┃ ┣ BancAmericaOCR.py
 ┃ ┗ Dockerfile.bankofamerica
 ┣ ChaseBankStatementOCR
 ┃ ┣ ChaseOCR.py
 ┃ ┗ Dockerfile.chase
 ┣ WellsFargoStatementOCR
 ┃ ┣ Dockerfile.wellsfargo
 ┃ ┗ WellsFargoOCR.py
 ┣ BancAmericaOCR.py
 ┣ ChaseOCR.py
 ┣ Dockerfile
 ┣ WellsFargoOCR.py
 ┣ __init__.py
 ┗ docker-compose.yml

What I've tried so far:

In docker-compose.yml:

version: '3'

services:
    mainbankclass_container:
        build: 
            context: '.'
            dockerfile: Dockerfile
        volumes: 
            - /Users:/Users
        #links:
        #    - "chase_container"
        #    - "wellsfargo_container"
        #    - "bankofamerica_container"
    chase_container:
        build: .
        working_dir: /app/ChaseBankStatementOCR
        command: ./ChaseOCR.py
        volumes: 
            - /Users:/Users
    bankofamerica_container:
        build: .
        working_dir: /app/BankofAmericaOCR
        command: ./BancAmericaOCR.py
        volumes: 
            - /Users:/Users
    wellsfargo_container:
        build: .
        working_dir: /app/WellsFargoStatementOCR
        command: ./WellsFargoOCR.py
        volumes: 
            - /Users:/Users

And each dockerfile under each bank folder is similar except CMD would be changed accordingly. For example, in ChaseBankStatementOCR folder:

FROM python:3.7-stretch
WORKDIR /app
COPY . /app
CMD ["python3", "ChaseOCR.py"] <---- changes are made here for the other two bank scripts

The last element is for Dockerfile outside of each folder:

FROM python:3.7-stretch
WORKDIR /app
COPY ./requirements.txt ./ 
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt
RUN pip3 install --upgrade PyMuPDF

COPY . /app

COPY ./ChaseOCR.py /app
COPY ./BancAmericaOCR.py /app
COPY ./WellsFargoOCR.py /app

EXPOSE 8080

CMD ["python3", "MainBankClass.py"]

After running docker-compose build, containers and network are successfully built. Error occurs when I run docker run -v /Users:/Users: python3 python3 ~/BankStatementsDemoOCR/BankStatements/MainBankClass.py and the error message is FileNotFoundError: [Errno 2] No such file or directory: 'BancAmericaOCR.py'

I am assuming that the container doesn't have BancAmericaOCR.py but I have composed each .py file under the same network and I don't think links is a good practice since docker recommended to use networks here. What am I missing here? Any help is much appreciated. Thanks in advance.

liamsuma
  • 156
  • 4
  • 19
  • Since Docker by design isolates container filesystems from the host, it's not a great match for processes that are focused heavily on local files. Docker Compose also is more designed for long-running processes and not batch jobs that will do some unit of work then exit. I might run this without Docker, in a Python virtual environment, driven by a shell script or with a Python-native driver. – David Maze Jul 29 '20 at 17:04
  • would you recommend to add `xx.sh` outside of each folder and change `CMD` to `["python3", "./xx.sh"]` then? @DavidMaze – liamsuma Jul 29 '20 at 17:06
  • 1
    I'd recommend [using a Python virtual environment](https://packaging.python.org/tutorials/installing-packages/) and not using Docker at all, as you've described the problem. – David Maze Jul 29 '20 at 19:06
  • I'd just make one Python container... Then use Flask, for example to dynamically run individual parser functions via REST API actions – OneCricketeer Aug 03 '20 at 18:28
  • @OneCricketeer thanks for your time and valuable input. Yes, we currently are running 1 container and I was overthinking the issue back then. – liamsuma Aug 03 '20 at 18:32

2 Answers2

1

single application in a single container ... need networks for different py files to communicate

You only have one container. Docker networks are for multiple containers to talk to one another. And Docker Compose has a default bridge network defined for all services, so you shouldn't need that if you were still using docker-compose

Here's a cleaned up Dockerfile with all the scripts copied in, with the addition of an entrypoint file

FROM python:3.7-stretch
WORKDIR /app
COPY ./requirements.txt ./  
RUN pip3 install --upgrade pip PyMuPDF && pip3 install -r requirements.txt

COPY . /app

COPY ./docker-entrypoint.sh /
ENTRYPOINT /docker-entrypoint.sh

In your entrypoint, you can loop over every file

#!/bin/bash

for b in Chase WellsFargo BofA ; do 
    python3 /app/$b.py
done

exec python3 /app/MainBankClass.py
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I am running a quick test with your input and will update you asap. – liamsuma Aug 03 '20 at 18:40
  • I am getting **permission denied** error after modifying for loop to match correct py file names – liamsuma Aug 03 '20 at 18:51
  • still getting **permission denied** error when adding `RUN ["chmod", "+x", "/app/docker_entrypoint.sh"]` in dockerfile to make it executable – liamsuma Aug 03 '20 at 19:07
  • You need to chmod outside of the Dockerifle. Nothing should need changed with the Dockerfile i gave – OneCricketeer Aug 03 '20 at 22:06
  • What do you mean by outside of the Dockerfile? to execute that shell script in terminal? – liamsuma Aug 04 '20 at 13:56
  • `chmod +x docker-entrypoint.sh` once, then `docker build`. You don't need the RUN command. You also don't need the script in `/app` – OneCricketeer Aug 04 '20 at 15:24
  • worked like a charm! now I have to deal with tesseractnotfound issue in docker. I am accepting your answer instead. Thanks for your time and valuable input. – liamsuma Aug 04 '20 at 15:47
0

So after days of searching regarding my case, I am closing this thread with an implementation of single application in a single container suggested on this link from docker forum. Instead of going with docker-compose, the suggested approach is to use 1 container with dockerfile for this application and it's working as expected.

On top of the dockerfile, we also need networks for different py files to communicate. For example:

docker network create my_net
docker run -it --network my_net -v /Users:/Users --rm my_awesome_app

EDIT: No network is needed since we are only running one container.

EDIT 2: Please see the accepted answer for future reference

Any answers are welcomed if anyone has better ideas on the case.

liamsuma
  • 156
  • 4
  • 19