I have a dataflow code in python 3.6 that works to copy data from pubsub topic into GCS bucket but when I create a template version of it with DataflowRunner I have this error:
Pip install failed for package: -r
Output from execution of subprocess: b'Collecting apache-beam==2.27.0 (from -r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\apache-beam-2.27.0.zip\r\nCollecting avro-python3!=1.9.2,<1.10.0,>=1.8.1 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\avro-python3-1.9.2.1.tar.gz\r\nCollecting crcmod<2.0,>=1.7 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\crcmod-1.7.tar.gz\r\nCollecting dill<0.3.2,>=0.3.1.1 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\dill-0.3.1.1.tar.gz\r\nCollecting fastavro<2,>=0.21.4 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\fastavro-1.2.3.tar.gz\r\nCollecting future<1.0.0,>=0.18.2 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\future-0.18.2.tar.gz\r\nCollecting grpcio<2,>=1.29.0 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\grpcio-1.34.0.tar.gz\r\nCollecting hdfs<3.0.0,>=2.1.0 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\hdfs-2.5.8.tar.gz\r\nCollecting httplib2<0.18.0,>=0.8 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\httplib2-0.17.4.tar.gz\r\nCollecting mock<3.0.0,>=1.0.1 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\mock-2.0.0.tar.gz\r\nCollecting numpy<2,>=1.14.3 (from apache-beam==2.27.0->-r ./requirements.txt (line 1))\r\n File was already downloaded c:\\users\\kaghole\\appdata\\local\\temp\\dataflow-requirements-cache\\numpy-1.19.5.zip\r\n Installing build dependencies: started\r\n Installing build dependencies: still running...\r\n Installing build dependencies: finished with status \'error\'\r\n Complete output from command C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\Scripts\\python.exe C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\lib\\site-packages\\pip-19.0.3-py3.6.egg\\pip install --ignore-installed --no-user --prefix C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-build-env-w2zl8_gl\\overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- setuptools<49.2.0 wheel<=0.35.1 Cython>=0.29.21,<3.0:\r\n Collecting setuptools<49.2.0\r\n Using cached https://files.pythonhosted.org/packages/d0/4a/22ee76842d8ffc123d4fc48d24a623c1d206b99968fe3960039f1efc2cbc/setuptools-49.1.3.zip\r\n Collecting wheel<=0.35.1\r\n Using cached https://files.pythonhosted.org/packages/83/72/611c121b6bd15479cb62f1a425b2e3372e121b324228df28e64cc28b01c2/wheel-0.35.1.tar.gz\r\n Collecting Cython<3.0,>=0.29.21\r\n Using cached https://files.pythonhosted.org/packages/6c/9f/f501ba9d178aeb1f5bf7da1ad5619b207c90ac235d9859961c11829d0160/Cython-0.29.21.tar.gz\r\n Installing collected packages: setuptools, wheel, Cython\r\n Running setup.py install for setuptools: started\r\n Running setup.py install for setuptools: finished with status \'done\'\r\n Running setup.py install for wheel: started\r\n Running setup.py install for wheel: finished with status \'done\'\r\n Running setup.py install for Cython: started\r\n Running setup.py install for Cython: finished with status \'error\'\r\n Complete output from command C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\Scripts\\python.exe -u -c "import setuptools, tokenize;__file__=\'C:\\\\Users\\\\kaghole\\\\AppData\\\\Local\\\\Temp\\\\pip-install-nn3jr_0n\\\\Cython\\\\setup.py\';f=getattr(tokenize, \'open\', open)(__file__);code=f.read().replace(\'\\r\\n\', \'\\n\');f.close();exec(compile(code, __file__, \'exec\'))" install --record C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-record-y1h5732j\\install-record.txt --single-version-externally-managed --prefix C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-build-env-w2zl8_gl\\overlay --compile --install-headers C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\include\\site\\python3.6\\Cython:\r\n Unable to find pgen, not compiling formal grammar.\r\n running install\r\n running build\r\n running build_py\r\n creating build\r\n creating build\\lib.win-amd64-3.6\r\n copying cython.py -> build\\lib.win-amd64-3.6\r\n creating build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\CodeWriter.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\Coverage.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\Debugging.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\Shadow.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\StringIOTree.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\TestUtils.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\Utils.py -> build\\lib.win-amd64-3.6\\Cython\r\n copying Cython\\__init__.py -> build\\lib.win-amd64-3.6\\Cython\r\n creating build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\BuildExecutable.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\Cythonize.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\Dependencies.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\Distutils.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\Inline.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\IpythonMagic.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n copying Cython\\Build\\__init__.py -> build\\lib.win-amd64-3.6\\Cython\\Build\r\n creating build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\AnalysedTreeTransforms.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\Annotate.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\AutoDocTransforms.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\Buffer.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\Builtin.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\CmdLine.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying Cython\\Compiler\\Code.py -> build\\lib.win-amd64-3.6\\Cython\\Compiler\r\n copying build_ext\r\n building \'Cython.Plex.Scanners\' extension\r\n error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/\r\n \r\n ----------------------------------------\r\n Command "C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\Scripts\\python.exe -u -c "import setuptools, tokenize;__file__=\'C:\\\\Users\\\\kaghole\\\\AppData\\\\Local\\\\Temp\\\\pip-install-nn3jr_0n\\\\Cython\\\\setup.py\';f=getattr(tokenize, \'open\', open)(__file__);code=f.read().replace(\'\\r\\n\', \'\\n\');f.close();exec(compile(code, __file__, \'exec\'))" install --record C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-record-y1h5732j\\install-record.txt --single-version-externally-managed --prefix C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-build-env-w2zl8_gl\\overlay --compile --install-headers C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\include\\site\\python3.6\\Cython" failed with error code 1 in C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-install-nn3jr_0n\\Cython\\\r\n \r\n ----------------------------------------\r\nCommand "C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\Scripts\\python.exe C:\\Users\\kaghole\\retention_for_retention\\retentionenv\\lib\\site-packages\\pip-19.0.3-py3.6.egg\\pip install --ignore-installed --no-user --prefix C:\\Users\\kaghole\\AppData\\Local\\Temp\\pip-build-env-w2zl8_gl\\overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- setuptools<49.2.0 wheel<=0.35.1 Cython>=0.29.21,<3.0" failed with error code 1 in None\r\n'
I am using below deployment command:
python -m df-pubsubRead-gcsWrite-Op --requirements_file requirements.txt --runner DataflowRunner --project ing-dev --staging_location gs://my_bucket/staging --temp_location gs://my_bucket/temp --template_location gs://my_bucket/templates/test/df-pubsubRead-gcsWrite-Op
The requirements.txt file:
apache-beam[gcp]==2.27.0
I tried:
Use setup.py per this Dataflow fails when I add requirements.txt [Python] but the setup_file argument is discarded:
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['setup.py', 'True'] WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['setup.py', 'True']
Not including the requirements file, which successfully create the template but the flow fails because apache-beam is not installed. In other words, specifying dependencies is a must for me. Unless there are other ways to install dependencies on Dataflow.