1

I am using MWAA with the PythonVirtualenvOperator installed using the instructions on AWS Site with this additional fix.

It is working ok, I am able to have different DAGs install different versions of boto3 within virtual environments.

The MWAA FAQs specify that Python version 3.7 is supported for MWAA, but that doesn't clarify if this limitation applies to PythonVirtualenvOperator and pyenv.

I wanted to try the python_version= argument to the PythonVirtualenvOperator() function, to set different python versions. When I specify:

python_version="3.8"

I get the following errors in MWAA log:

RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.8'

Dag Source Code

from platform import python_version
from airflow import DAG
from airflow.operators.python import PythonVirtualenvOperator
from airflow.utils.dates import days_ago


def virtualenv_fn_5():
    import boto3
    import sys

    print("boto3 version ", boto3.__version__)
    print(f"This is printed from the dag script: {__file__}")
    print(f"Python version: {sys.version}")


with DAG(
    dag_id="virtualenv_test_5",
    schedule_interval=None,
    catchup=False,
    start_date=days_ago(1),
) as dag:
    virtualenv_task = PythonVirtualenvOperator(
        task_id="virtualenv_task_5",
        python_callable=virtualenv_fn_5,
        python_version="3.8",
        requirements=["boto3==1.17.1"],
        system_site_packages=False,
        dag=dag,
    )

Log

*** Reading remote log from Cloudwatch log_group: airflow-mwaa-environment-public-network-MwaaEnvironment-Task log_stream: virtualenv_test_5/virtualenv_task_5/2022-05-18T01_53_13.871659+00_00/1.log.
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [queued]>
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [queued]>
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1241}} INFO - 
--------------------------------------------------------------------------------
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1242}} INFO - Starting attempt 1 of 1
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1243}} INFO - 
--------------------------------------------------------------------------------
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1262}} INFO - Executing <Task(PythonVirtualenvOperator): virtualenv_task_5> on 2022-05-18 01:53:13.871659+00:00
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:52}} INFO - Started process 799 to run task
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'virtualenv_test_5', 'virtualenv_task_5', 'manual__2022-05-18T01:53:13.871659+00:00', '--job-id', '118', '--raw', '--subdir', 'DAGS_FOLDER/holder_folder/virtualenv_test_5.py', '--cfg-path', '/tmp/tmp46ax8anc', '--error-file', '/tmp/tmpi6bk2idu']
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:77}} INFO - Job 118: Subtask virtualenv_task_5
[2022-05-18, 01:53:14 UTC] {{logging_mixin.py:109}} INFO - Running <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [running]> on host ip-10-12-12-12.ap-southeast-2.compute.internal
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1429}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=virtualenv_test_5
AIRFLOW_CTX_TASK_ID=virtualenv_task_5
AIRFLOW_CTX_EXECUTION_DATE=2022-05-18T01:53:13.871659+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-05-18T01:53:13.871659+00:00
[2022-05-18, 01:53:14 UTC] {{process_utils.py:135}} INFO - Executing cmd: python3 /usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv /tmp/venvswu6o8zm --python=python3.8
[2022-05-18, 01:53:14 UTC] {{process_utils.py:139}} INFO - Output:
[2022-05-18, 01:53:14 UTC] {{process_utils.py:143}} INFO - RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.8'
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1703}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    result = execute_callable(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 365, in execute
    return super().execute(context=serializable_context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 151, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 381, in execute_callable
    requirements=self.requirements,
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 96, in prepare_virtualenv
    execute_in_subprocess(virtualenv_cmd)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
    raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['python3', '/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', '/tmp/venvswu6o8zm', '--python=python3.8']' returned non-zero exit status 1.
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1280}} INFO - Marking task as FAILED. dag_id=virtualenv_test_5, task_id=virtualenv_task_5, execution_date=20220518T015313, start_date=20220518T015314, end_date=20220518T015314
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:91}} ERROR - Failed to execute job 118 for task virtualenv_task_5
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
    args.func(args, dag=self.dag)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
    _run_raw_task(args, ti)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task
    error_file=args.error_file,
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    result = execute_callable(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 365, in execute
    return super().execute(context=serializable_context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 151, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 381, in execute_callable
    requirements=self.requirements,
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 96, in prepare_virtualenv
    execute_in_subprocess(virtualenv_cmd)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
    raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['python3', '/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', '/tmp/venvswu6o8zm', '--python=python3.8']' returned non-zero exit status 1.
[2022-05-18, 01:53:14 UTC] {{local_task_job.py:154}} INFO - Task exited with return code 1
[2022-05-18, 01:53:15 UTC] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check
MattG
  • 5,589
  • 5
  • 36
  • 52
  • 1
    It's not clearly documented, but mwaa uses 3.7, and it probably only support 3.7 for `PythonVirtualenvOperator` – 0x26res May 18 '22 at 13:16

1 Answers1

1

As suggested in a comment above by 0x26res, MWAA PythonVirtualenvOperator only supports Python 3.

The version of Python used by MWAA is currently, as at 19th May 2022, 3.7.10:

print(f"Python version: {sys.version}")

Produces:

[2022-05-19, 00:01:46 UTC] {{logging_mixin.py:109}} INFO - Python version: 3.7.10 (default, Jun  3 2021, 00:02:01) 

The only working values for the python_version argument of PythonVirtualenvOperator currently seem to be:

  • 3
  • 3.7
  • 3.7.10

Anything else causes the exception shown above.

In practice, this makes usage of the python_version argument of PythonVirtualenvOperator useless other than to assert the usage of a specific python version. We can't use python_version within MWAA PythonVirtualenvOperator to select a different python version.

MattG
  • 5,589
  • 5
  • 36
  • 52