3

Receiving below error in task logs when running DAG:

FileNotFoundError: [Errno 2] No such file or directory: 'beeline': 'beeline'

This is my DAG:

import airflow
from airflow import DAG
from airflow.providers.apache.hive.operators.hive import HiveOperator
from airflow.utils.dates import days_ago
from datetime import timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag_data_summarizer = DAG(
    dag_id="data_summarizer",
    default_args=default_args,
    description='Data summarizer DAG',
    schedule_interval='*/20 * * * *',
    start_date=airflow.utils.dates.days_ago(1)
)

hql_query = """create database if not exist new_test_db;"""

hive_task = HiveOperator(
    hql=hql_query,
    task_id="data_retrieval",
    hive_cli_conn_id="new_hive_conn",
    dag=dag_data_summarizer,
    run_as_user="airflow" # airflow user has beeline executable set in PATH
)

if __name__ == '__main__':
    dag_data_summarizer.cli()

The new_hive_conn connection is of type "hive_cli" (tried with a connection type "hiveserver2" as well did not work)

The task log prints the below command: beeline -u "jdbc:hive2://hive-server-1:10000/default;auth=NONE"

When I run this command on the worker docker container the command works and I am connected with the hive server.

The worker container has the beeline executable configured and set on its PATH for the "airflow" and "root" users: /home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/airflow/hive/apache-hive-2.3.2-bin/bin:/home/airflow/hadoop/hadoop-3.3.1/bin

user9492428
  • 603
  • 1
  • 9
  • 25

1 Answers1

1

The 'run_as_user' feature uses 'sudo' to switch to airflow user in non-interactive mode. The sudo comand will never (no matter what parameters you specify including -E) preserve PATH variable unless you do sudo in --interactive mode (logging in by the user). Only in the --interactive mode the user's .profile , .bashrc and other startup scripts are executed (and those are the scripts that set PATH for the user usually).

All non-interactive 'sudo' command will have path set to secure_path set in /etc/sudoers file.

My case here:

secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

You need to add your path to /etc/sudoers or copy/link beeline into one of the existing "secure" binary paths.

Jarek Potiuk
  • 19,317
  • 2
  • 60
  • 61
  • 1
    Understood.. But even if I remove the `run_as_user` parameter, I am getting the same error.. What would be the reason for that? – user9492428 Oct 29 '21 at 07:11
  • Similar story - your PATH is only set currently likely only in .bashrc or similar (and those are parsed only in --interactive mode (which is basically when you run as "real" user via terminal. In case of "unattended" process only part of the environment variables is set - and likely your PATH is not. See https://stackoverflow.com/questions/42725538/set-environment-variables-for-non-interactive-shell for example. – Jarek Potiuk Oct 29 '21 at 16:28
  • 1
    thanks a lot for your answer. The DAG works now as expected. I extended the official airflow image to download the Hive/Beeline and Hadoop binaries and set them to the `secure_path` in my Dockerfile. Is this required outlined in the official documentation anywhere? Was stuck on this issue for quite a while & could not find a fix in any documentation. – user9492428 Oct 31 '21 at 21:03
  • This is not airflow specific thing, this is a standard `sudo` behaviour. But If yout think it could be useful to add it to arflow - maybe you can go to the right page in the docs that you'd expect it and add a comment about it? Each page has a "suggest changes on that page" button that opens a Github Pull Request and you can add your proposal super-easiily. Airflow is created by ~ 1800 contributors and you can easily become one of them! It's a cool way to thank for the free software you get (most of those people do it in their free time). – Jarek Potiuk Nov 02 '21 at 08:36