1

i am attempting to run a Dataflow job using Flex template, but i am getting stuck on a 'module not found error' and i cannot figure out why so here is the structure of my directory

|__ modules
  |____ edgar_quarterly_form4.py
  |____ __init__.py
|__ main.py
|__ setup.py
|__ __init__.py

my main.py has this import in its code

from modules import edgar_quarterly_form4

and here's my dockerfile

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
RUN mkdir -p ${WORKDIR}/modules
WORKDIR ${WORKDIR}

COPY spec/python_command_spec.json ${WORKDIR}/python_command_spec.json
COPY modules ${WORKDIR}/modules

ENV DATAFLOW_PYTHON_COMMAND_SPEC ${WORKDIR}/python_command_spec.json

RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.27.0

COPY __init__.py ${WORKDIR}/__init__.py
COPY setup.py ${WORKDIR}/setup.py 
COPY main.py ${WORKDIR}/main.py
# Super important to add these lines.
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"

And here's my setup.py file

import setuptools

REQUIRED_PACKAGES = [
'numpy',
'beautifulsoup4',
'pandas',
'sendgrid==6.2.1',
'lxml',
'pandas_datareader',
'apache-beam[gcp]==2.27.0',
]
setuptools.setup(
   packages=setuptools.find_packages(),
install_requires=REQUIRED_PACKAGES,
)

However, every time my template runs i am getting this error

368, in load_session module = unpickler.load() File "/usr/local/lib/python3.7/site- 
packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File 
"/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 827, in _import_module return 
 getattr(__import__(module, None, None, [obj]), obj) ModuleNotFoundError: No module named 
 'modules'

And i cannot figure out why. I have added some echos to my docker file to see if all of the files have been copied, and all the files have been copied successfully to the image... so i cannot really figure out what's going on Please note i am getting exacty the same error even if the edgar_quarterly_form4.py file is in the same directory as main.py

kind regards Marco

Progman
  • 16,827
  • 6
  • 33
  • 48
user1068378
  • 333
  • 2
  • 12
  • Check if the following solution would work for you: https://stackoverflow.com/a/71839761/7611838 – Idhem Apr 12 '22 at 08:57

1 Answers1

1

Ok, it seeems that with beams 2.27 this solution does not work Instead, you shoudl follow what is outlined in this thread

Including another file in Dataflow Python flex template, ImportError

you'll have to add a setup_file parameter to your metadata, and pass a
--parameter setup_file=

user1068378
  • 333
  • 2
  • 12