I am using GCP Composer to orchestrate the ETL…
When I created the instance, I set the Python version to Python 3
One of the tasks using DataFlowPythonOperator which works fine if initiated from our local dev-docker instance (Airflow v1.10.1 + Python 3.6.9)
it uses Apache Beam Python 3.6 SDK 2.16.0 if I run it from the Docker image which runs Airflow v1.10.1
Whenever we deploy to composer-1.7.9-airflow-1.10.1 the task runs with Python 2.7…
It also always run the Dataflow job using Google Cloud Dataflow SDK for Python 2.5.0 if initiated from Composer
Composer by default consider the Python version 2.7, and that crashes a lot of the transformations…
I can’t find a way to configure Composer to use Python 3.x to create and run the Dataflow job…
Command:
$ gcloud composer environments describe etl --location us-central1
result:
softwareConfig:
imageVersion: composer-1.7.9-airflow-1.10.1
pythonVersion: '3'