I am using MWAA with the PythonVirtualenvOperator installed using the instructions on AWS Site with this additional fix.
It is working ok, I am able to have different DAGs install different versions of boto3 within virtual environments.
The MWAA FAQs specify that Python version 3.7 is supported for MWAA, but that doesn't clarify if this limitation applies to PythonVirtualenvOperator and pyenv.
I wanted to try the python_version= argument to the PythonVirtualenvOperator() function, to set different python versions. When I specify:
python_version="3.8"
I get the following errors in MWAA log:
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.8'
Dag Source Code
from platform import python_version
from airflow import DAG
from airflow.operators.python import PythonVirtualenvOperator
from airflow.utils.dates import days_ago
def virtualenv_fn_5():
import boto3
import sys
print("boto3 version ", boto3.__version__)
print(f"This is printed from the dag script: {__file__}")
print(f"Python version: {sys.version}")
with DAG(
dag_id="virtualenv_test_5",
schedule_interval=None,
catchup=False,
start_date=days_ago(1),
) as dag:
virtualenv_task = PythonVirtualenvOperator(
task_id="virtualenv_task_5",
python_callable=virtualenv_fn_5,
python_version="3.8",
requirements=["boto3==1.17.1"],
system_site_packages=False,
dag=dag,
)
Log
*** Reading remote log from Cloudwatch log_group: airflow-mwaa-environment-public-network-MwaaEnvironment-Task log_stream: virtualenv_test_5/virtualenv_task_5/2022-05-18T01_53_13.871659+00_00/1.log.
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [queued]>
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [queued]>
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1241}} INFO -
--------------------------------------------------------------------------------
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1242}} INFO - Starting attempt 1 of 1
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1243}} INFO -
--------------------------------------------------------------------------------
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1262}} INFO - Executing <Task(PythonVirtualenvOperator): virtualenv_task_5> on 2022-05-18 01:53:13.871659+00:00
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:52}} INFO - Started process 799 to run task
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'virtualenv_test_5', 'virtualenv_task_5', 'manual__2022-05-18T01:53:13.871659+00:00', '--job-id', '118', '--raw', '--subdir', 'DAGS_FOLDER/holder_folder/virtualenv_test_5.py', '--cfg-path', '/tmp/tmp46ax8anc', '--error-file', '/tmp/tmpi6bk2idu']
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:77}} INFO - Job 118: Subtask virtualenv_task_5
[2022-05-18, 01:53:14 UTC] {{logging_mixin.py:109}} INFO - Running <TaskInstance: virtualenv_test_5.virtualenv_task_5 manual__2022-05-18T01:53:13.871659+00:00 [running]> on host ip-10-12-12-12.ap-southeast-2.compute.internal
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1429}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=virtualenv_test_5
AIRFLOW_CTX_TASK_ID=virtualenv_task_5
AIRFLOW_CTX_EXECUTION_DATE=2022-05-18T01:53:13.871659+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-05-18T01:53:13.871659+00:00
[2022-05-18, 01:53:14 UTC] {{process_utils.py:135}} INFO - Executing cmd: python3 /usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv /tmp/venvswu6o8zm --python=python3.8
[2022-05-18, 01:53:14 UTC] {{process_utils.py:139}} INFO - Output:
[2022-05-18, 01:53:14 UTC] {{process_utils.py:143}} INFO - RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.8'
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1703}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 365, in execute
return super().execute(context=serializable_context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 151, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 381, in execute_callable
requirements=self.requirements,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 96, in prepare_virtualenv
execute_in_subprocess(virtualenv_cmd)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['python3', '/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', '/tmp/venvswu6o8zm', '--python=python3.8']' returned non-zero exit status 1.
[2022-05-18, 01:53:14 UTC] {{taskinstance.py:1280}} INFO - Marking task as FAILED. dag_id=virtualenv_test_5, task_id=virtualenv_task_5, execution_date=20220518T015313, start_date=20220518T015314, end_date=20220518T015314
[2022-05-18, 01:53:14 UTC] {{standard_task_runner.py:91}} ERROR - Failed to execute job 118 for task virtualenv_task_5
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task
error_file=args.error_file,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 365, in execute
return super().execute(context=serializable_context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 151, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 381, in execute_callable
requirements=self.requirements,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py", line 96, in prepare_virtualenv
execute_in_subprocess(virtualenv_cmd)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['python3', '/usr/local/airflow/.local/lib/python3.7/site-packages/virtualenv', '/tmp/venvswu6o8zm', '--python=python3.8']' returned non-zero exit status 1.
[2022-05-18, 01:53:14 UTC] {{local_task_job.py:154}} INFO - Task exited with return code 1
[2022-05-18, 01:53:15 UTC] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check