7

I am using BashOperator to run a bash command that requires multiple parameters to work. Among those parameters I am sending a password.

bash_task = BashOperator(
    task_id='bash_operator_example',
    bash_command=f'echo {param1} {password}',
)

Problem here is that even though I am obtaining that password from a connection on airflow which should maintain it secret, when the operator runs the command, the password appears in clear test in the log.

See the log generated below:

[2021-03-06 17:42:14,079] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmp_mez12q_/bash_operator_examplena7hxdup
[2021-03-06 17:42:14,079] {bash_operator.py:146} INFO - Running command: echo my_param_1 my_password
[2021-03-06 17:42:14,092] {bash_operator.py:153} INFO - Output:
[2021-03-06 17:42:14,094] {bash_operator.py:157} INFO - my_param_1 my_password
[2021-03-06 17:42:14,094] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-03-06 17:42:14,101] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=TEST_running_DAG, task_id=bash_operator_example, execution_date=20210101T000000, start_date=20210306T174214, end_date=20210306T174214

I tried a couple of things, for example, using the params available in bash operator like:

bash_task = BashOperator(
        task_id='bash_operator_example',
        bash_command='echo {{ params.param1 }} {{ params.password }}',
        params={
            "param1": param1,
            "password": password
        }
    )

but the result is still the same, it gets printed in the log. I also tried using env, but while not in the log, the complete env is available in the option "Task Instance Details" of the UI, so password gets also visible there.

The one thing that seemed to work was to run the bash command in a PythonOperator by using subprocess library. But I think that Airflow should have an option easier that I am just not aware of.

Would appreciate if anybody with experience with Airflow can point me to the right direction with this.

2020-03-07 UPDATE:

In the end, what I did was to author my own Operator. I still think that there must be a easier solution as this should be a quite common use case. This is what I did:

Create a new Operator, you can use this guide to see how to author your own operator:

https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html

Then, I basically replace both the init and execute with the code of base operator, which you can find here:

https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/operators/bash_operator.html

Finally, I added a new parameter password to the init method, and I used that parameter to replace the self.bash_command like:

with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
        #Custom Code to replace the password
        final_bash_command = self.bash_command.replace(':password', 
        self.password) if self.password else self.bash_command
    
        f.write(bytes(final_bash_command, 'utf_8'))
        ...

Basically if the param password exists it searches for the placeholder :password and replace it with the parameter "password"

Not the most elegant solution, but it made the trick as later in the code the log prints self.bash_command but not my new variable final_bash_command which is what actually gets executed.

This is how I use the new Operator:

hello_task = PasswordBashOperator(
        task_id='sample-task',
        bash_command='echo {{ params.param1 }} :password',
        params={
            "param1": param1
        },
        password=password
)

Now, if I run the task it prints:

[2021-03-07 16:16:55,786] {password_bash_operator.py:64} INFO - Temporary script location: /tmp/airflowtmpfos7yapf/sample-task_da9_lrr
[2021-03-07 16:16:55,786] {password_bash_operator.py:74} INFO - Running command: echo my_param_1 :password
[2021-03-07 16:16:55,792] {password_bash_operator.py:83} INFO - Output:
[2021-03-07 16:16:55,793] {password_bash_operator.py:87} INFO - my_param_1 my_password
[2021-03-07 16:16:55,793] {password_bash_operator.py:91} INFO - Command exited with return code 0
[2021-03-07 16:16:55,799] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=TEST_running_DAG, task_id=sample-task, execution_date=20210101T000000, start_date=20210307T161655, end_date=20210307T161655

So, the line:

[2021-03-07 16:16:55,786] {password_bash_operator.py:74} INFO - Running command: echo my_param_1 :password

does not print the password anymore.

I would still be interested if anybody can find a better and more elegant solution to this.

Ruli
  • 2,592
  • 12
  • 30
  • 40

1 Answers1

1

I also had this problem and managed to solve it a different way. I thought I would share what I did as this was the first post I found when looking for a solution.

I used the env argument in the BashOperator to set a variable that can be used by the bash command like so:

PASSWORD = "my_password"

some_task = BashOperator(task_id="push_files",
                         bash_command="echo $password,
                         env={'password': PASSWORD})

This results in the Airflow log not displaying the password string when it prints to the log the command that will be run:

[2021-11-29 11:19:07,527] {bash_operator.py:147} INFO - Running command: echo $PASSWORD
[2021-11-29 11:19:07,674] {bash_operator.py:154} INFO - Output: 
[2021-11-29 11:19:07,678] {bash_operator.py:158} INFO - my_password

I'm using Airflow 1.10.15 as part of GCPs Cloud Composer and found this solution in the docs.

Gamboge
  • 31
  • 4