I am using BashOperator to run a bash command that requires multiple parameters to work. Among those parameters I am sending a password.
bash_task = BashOperator(
task_id='bash_operator_example',
bash_command=f'echo {param1} {password}',
)
Problem here is that even though I am obtaining that password from a connection on airflow which should maintain it secret, when the operator runs the command, the password appears in clear test in the log.
See the log generated below:
[2021-03-06 17:42:14,079] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmp_mez12q_/bash_operator_examplena7hxdup
[2021-03-06 17:42:14,079] {bash_operator.py:146} INFO - Running command: echo my_param_1 my_password
[2021-03-06 17:42:14,092] {bash_operator.py:153} INFO - Output:
[2021-03-06 17:42:14,094] {bash_operator.py:157} INFO - my_param_1 my_password
[2021-03-06 17:42:14,094] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-03-06 17:42:14,101] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=TEST_running_DAG, task_id=bash_operator_example, execution_date=20210101T000000, start_date=20210306T174214, end_date=20210306T174214
I tried a couple of things, for example, using the params available in bash operator like:
bash_task = BashOperator(
task_id='bash_operator_example',
bash_command='echo {{ params.param1 }} {{ params.password }}',
params={
"param1": param1,
"password": password
}
)
but the result is still the same, it gets printed in the log. I also tried using env, but while not in the log, the complete env is available in the option "Task Instance Details" of the UI, so password gets also visible there.
The one thing that seemed to work was to run the bash command in a PythonOperator by using subprocess library. But I think that Airflow should have an option easier that I am just not aware of.
Would appreciate if anybody with experience with Airflow can point me to the right direction with this.
2020-03-07 UPDATE:
In the end, what I did was to author my own Operator. I still think that there must be a easier solution as this should be a quite common use case. This is what I did:
Create a new Operator, you can use this guide to see how to author your own operator:
https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html
Then, I basically replace both the init and execute with the code of base operator, which you can find here:
https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/operators/bash_operator.html
Finally, I added a new parameter password to the init method, and I used that parameter to replace the self.bash_command like:
with NamedTemporaryFile(dir=tmp_dir, prefix=self.task_id) as f:
#Custom Code to replace the password
final_bash_command = self.bash_command.replace(':password',
self.password) if self.password else self.bash_command
f.write(bytes(final_bash_command, 'utf_8'))
...
Basically if the param password exists it searches for the placeholder :password and replace it with the parameter "password"
Not the most elegant solution, but it made the trick as later in the code the log prints self.bash_command but not my new variable final_bash_command which is what actually gets executed.
This is how I use the new Operator:
hello_task = PasswordBashOperator(
task_id='sample-task',
bash_command='echo {{ params.param1 }} :password',
params={
"param1": param1
},
password=password
)
Now, if I run the task it prints:
[2021-03-07 16:16:55,786] {password_bash_operator.py:64} INFO - Temporary script location: /tmp/airflowtmpfos7yapf/sample-task_da9_lrr
[2021-03-07 16:16:55,786] {password_bash_operator.py:74} INFO - Running command: echo my_param_1 :password
[2021-03-07 16:16:55,792] {password_bash_operator.py:83} INFO - Output:
[2021-03-07 16:16:55,793] {password_bash_operator.py:87} INFO - my_param_1 my_password
[2021-03-07 16:16:55,793] {password_bash_operator.py:91} INFO - Command exited with return code 0
[2021-03-07 16:16:55,799] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=TEST_running_DAG, task_id=sample-task, execution_date=20210101T000000, start_date=20210307T161655, end_date=20210307T161655
So, the line:
[2021-03-07 16:16:55,786] {password_bash_operator.py:74} INFO - Running command: echo my_param_1 :password
does not print the password anymore.
I would still be interested if anybody can find a better and more elegant solution to this.