0

I need to create a bacth-processing-data-intensive application. I have created synthetic data, which is saved in "data" and after running my script in "data_ingestion", I have also the database in "db". The application is running with flask. I have written microservice python scripts for "data_ingestion", "data_processing" and "data_aggregation". Now, I created a dockerfile for data_aggregation, and the image was created without any issues. When I am doing it for data_ingestion, I am seeing the "file not found error".

Dockerfile:

# Use an official Python runtime as the base image
FROM python:3.9

# Set the working directory in the container
WORKDIR /data2

# Copy the CSV file and the database file to the container
COPY ./data/financial_data.csv ./data/
COPY ./db/financial_data.db ./db/
COPY ./data_ingestion.py .

# Expose port 5000 (or any other port your Flask app is listening on)
EXPOSE 5000

# Run the Flask app when the container launches
CMD ["python", "data_ingestion.py"]`

Error:

 => [internal] load .dockerignore                                                                                            0.1s
 => => transferring context: 2B                                                                                                                                                                                                         0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                    0.1s
 => => transferring dockerfile: 511B                                                                                                                                                                                                    0.1s
 => [internal] load metadata for docker.io/library/python:3.9                                                                                                                                                                           1.4s
 => [1/5] FROM docker.io/library/python:3.9@sha256:9ba 0.4s
 => => resolve docker.io/library/python:3.9@sha256 0.4s
 => [internal] load build context                                                                                                                                                                                                       0.1s
 => => transferring context: 39B                                                                                                                                                                                                        0.0s
 => CACHED [2/5] WORKDIR /data2                                                                                                                                                                                                         0.0s
 => CACHED [3/5] COPY ./data/financial_data.csv ./data/                                                                                                                                                                                 0.0s
 => ERROR [4/5] COPY ./db/financial_data.db ./db/                                                                                                                                                                                       0.0s
------
 > [4/5] COPY ./db/financial_data.db ./db/:
------
Dockerfile:9
--------------------
   7 |     # Copy the CSV file and the database file to the container
   8 |     COPY ./data/financial_data.csv ./data/
   9 | >>> COPY ./db/financial_data.db ./db/
  10 |     COPY ./data_ingestion.py .
  11 |
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::mvomn3m1y0lfugm6sk6fdvdiw: "/db/financial_data.db": not found`

My folder structure:

C:.
│   .gitignore
│   bashlog4docker.txt
│   Contributing.md
│   docker-compose.yml
│   License.md
│   README.md
│   redundant_file.txt
│
├───app
│   │   app.py
│   │   Readme.txt
│   │
│   ├───static
│   ├───templates
│   │       data_aggregation_results.html
│   │       data_ingestion_results.html
│   │       error.html
│   │       index.html
│   │       processed_data_results.html
│   │       success.html
│   │
│   ├───tests
│   │   │   test_app.py
│   │   │   __init__.py
│   │   │
│   │   ├───.pytest_cache
│   │   │   │   .gitignore
│   │   │   │   CACHEDIR.TAG
│   │   │   │   README.md
│   │   │   │
│   │   │   └───v
│   │   │       └───cache
│   │   │               lastfailed
│   │   │               nodeids
│   │   │               stepwise
│   │   │
│   │   └───__pycache__
│   │           test_app.cpython-39-pytest-6.2.5.pyc
│   │           __init__.cpython-39.pyc
│   │
│   └───__pycache__
│           app.cpython-39.pyc
│
├───data
│       financial_data.csv
│
├───data_aggregation
│   │   data_aggregation.log
│   │   data_aggregation.py
│   │   Dockerfile
│   │   Readme.txt
│   │   requirements.txt
│   │   __init__.py
│   │
│   ├───tests
│   │   │   test_data_aggregation.py
│   │   │   __init__.py
│   │   │
│   │   └───__pycache__
│   │           test_data_aggregation.cpython-310.pyc
│   │           __init__.cpython-310.pyc
│   │
│   └───__pycache__
│           data_aggregation.cpython-310.pyc
│           data_aggregation.cpython-39.pyc
│           data_aggregation_abs_path.cpython-39.pyc
│           __init__.cpython-39.pyc
│
├───data_ingestion
│   │   data_ingestion.log
│   │   data_ingestion.py
│   │   Dockerfile
│   │   Readme.txt
│   │   requirements.txt
│   │   __init__.py
│   │
│   ├───tests
│   │   │   test_data_ingestion.py
│   │   │   __init__.py
│   │   │
│   │   └───__pycache__
│   │           test_data_ingestion.cpython-310.pyc
│   │
│   └───__pycache__
│           data_ingestion.cpython-310.pyc
│           data_ingestion.cpython-39.pyc
│           data_ingestion_abs_path.cpython-39.pyc
│           __init__.cpython-39.pyc
│
├───data_processing
│   │   data_processing.log
│   │   data_processing.py
│   │   Dockerfile
│   │   Readme.txt
│   │   requirements.txt
│   │   __init__.py
│   │
│   ├───tests
│   │   │   test_data_processing.py
│   │   │   __init__.py
│   │   │
│   │   └───__pycache__
│   │           test_data_processing.cpython-310.pyc
│   │           __init__.cpython-39.pyc
│   │
│   └───__pycache__
│           data_preprocessing_abs_path.cpython-39.pyc
│           data_processing.cpython-310.pyc
│           data_processing.cpython-39.pyc
│           __init__.cpython-39.pyc
│
├───db
│       Dockerfile
│       financial_data.db
│
└───__pycache__
        1_data_ingestion.cpython-39.pyc
        2_data_preprocessing.cpython-39.pyc
        3_data_aggregation.cpython-39.pyc
        6_data_storage_retrieval.cpython-39.pyc
        __1_data_ingestion.cpython-39.pyc
        __2_data_preprocessing.cpython-39.pyc
        __3_data_aggregation.cpython-39.pyc
        __4_data_validation.cpython-39.pyc
        __5_data_analysis.cpython-39.pyc

I know, my structure / folders are not conventional. This is my first assignment from the university. What I need to know: why can't I create an image, with the Dockerfile? Why is it not throwing the error for csv but only for db? Is docker not able to copy files from other folders? Because when I copy them manually in ".dat_ingestion", the image is created; although when I run the image, it is stopped after some seconds. Don't know why.

Happy for any constructive suggestion and solution. Thank you in advance!

I have tried to change path structures but nothing really worked. I have googled the web for my problem but with "dockerfile" the results are not precise enough. The documentation from docker says to choose "from...copy" but it also did not work.

waqas
  • 11
  • 1
  • 7
  • 1
    Share the folder structure by sharing the output of `tree` command. – shaik moeed Aug 25 '23 at 05:53
  • @shaikmoeed do you mean this? ├── app │ ├── static │ │ └── ... (CSS, JS, images) │ ├── templates │ │ └── ... (HTML templates) │ ├── app.py │ └── README.md ├── data_ingestion │ ├── financial_data.csv │ ├── data_ingestion.py │ ├── data_ingestion.log │ └── README.md ├── data_processing – waqas Aug 25 '23 at 13:00
  • 1
    Yes, add to your post by editing it [here](https://stackoverflow.com/posts/76974509/edit). – shaik moeed Aug 25 '23 at 13:32
  • @shaikmoeed just did. Do you have an idea, how I can work on here? Or what best practices are to work with microservices and docker, while creating a data-intensive application? – waqas Aug 26 '23 at 09:38
  • Does this answer your question? [Docker: adding a file from a parent directory](https://stackoverflow.com/questions/24537340/docker-adding-a-file-from-a-parent-directory) – shaik moeed Aug 26 '23 at 10:05
  • @shaikmoeed I have tried that too. The thing is that I am working with Chatgpt and she has suggested all the different methods and possibilities that are in the web currently. It actually suggested to ask in a community for help. I think its a human error somewhere, why csv works and not db, which chatgpt is not able to solve. So that is the reason for this post. My university is online and I have reached out to my Prof as well, but not response yet. – waqas Aug 26 '23 at 10:09
  • What is the dockerfile inside db folder? – shaik moeed Aug 26 '23 at 10:32
  • I suggest to keep all the dockerfiles with different names in single folder and keep it at root location. That way, most of this issues won't occur. – shaik moeed Aug 26 '23 at 10:37
  • @shaikmoeed I think it was suggested by chatgpt and I left it there without using it ever # Use an official SQLite image as the base image FROM alpine:latest # Install SQLite RUN apk add --no-cache sqlite # Create a directory for the database file RUN mkdir /data # Copy the financial_data.db file to the /data directory in the container COPY financial_data.db /data/ # Set the working directory to /data WORKDIR /data I will try your approach. But I thought the dockerfiles need to be placed in same folder as the script.. – waqas Aug 26 '23 at 10:58

1 Answers1

0

So, the issue is that Docker cannot go to parent folder but to child-items. I transferred the solution from https://stackoverflow.com/a/64404455/15682324

    The solution that I have applied now:
    WORKDIR /data2 (IT MUST BE THE DIR WHERE YOUR DOCKERFILE WILL RESIDE)
    
    # Copy the CSV file and the database file to the container
    COPY child-folder/financial_data.csv /parentfolder/
    COPY child-folder/financial_data.db /parentfolder/
    
    Your working dir will become your container and the subfolders will become images after you run the command in bash (WS!): docker build -t <image_name> -f <unique_Dockerfile_name> .

Further, I found out that actually I would have avoided the above steps by just removing "database" from dockerfile and mention it in the YAML file.

waqas
  • 11
  • 1
  • 7