i am attempting to run a Dataflow job using Flex template, but i am getting stuck on a 'module not found error' and i cannot figure out why so here is the structure of my directory
|__ modules
|____ edgar_quarterly_form4.py
|____ __init__.py
|__ main.py
|__ setup.py
|__ __init__.py
my main.py has this import in its code
from modules import edgar_quarterly_form4
and here's my dockerfile
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
RUN mkdir -p ${WORKDIR}/modules
WORKDIR ${WORKDIR}
COPY spec/python_command_spec.json ${WORKDIR}/python_command_spec.json
COPY modules ${WORKDIR}/modules
ENV DATAFLOW_PYTHON_COMMAND_SPEC ${WORKDIR}/python_command_spec.json
RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.27.0
COPY __init__.py ${WORKDIR}/__init__.py
COPY setup.py ${WORKDIR}/setup.py
COPY main.py ${WORKDIR}/main.py
# Super important to add these lines.
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
And here's my setup.py file
import setuptools
REQUIRED_PACKAGES = [
'numpy',
'beautifulsoup4',
'pandas',
'sendgrid==6.2.1',
'lxml',
'pandas_datareader',
'apache-beam[gcp]==2.27.0',
]
setuptools.setup(
packages=setuptools.find_packages(),
install_requires=REQUIRED_PACKAGES,
)
However, every time my template runs i am getting this error
368, in load_session module = unpickler.load() File "/usr/local/lib/python3.7/site-
packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File
"/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 827, in _import_module return
getattr(__import__(module, None, None, [obj]), obj) ModuleNotFoundError: No module named
'modules'
And i cannot figure out why. I have added some echos to my docker file to see if all of the files have been copied, and all the files have been copied successfully to the image... so i cannot really figure out what's going on Please note i am getting exacty the same error even if the edgar_quarterly_form4.py file is in the same directory as main.py
kind regards Marco