6

I'm wondering what Airflow offers in the sense of Audit Logs. My Airflow environment is running Airflow version 1.10 and uses the [ldap] section of the airflow.cfg file to use my companies Active Dicrectory (AD) for authentication. I see when someone logs into Airflow through the Web UI it writes the users name into the webserver's log (shown below). I'm wondering though if Airflow can be modified to also log when the user turns on/off a DAG, creates a new Airflow Variable or Pool, Clears a Task, marks a Task as Success, and any other operation that a user can do.

I need to be able to have some sort of tractability to the user's activities because in order to use Airflow at my work I have to get it to pass a security review from an Architect and he requires the ability to trace user's activities.

Is this ability offered out of the box by Airflow? I see that if I were to go with Google Cloud's Airflow service called Cloud Composer then I would get Audit Logs through their service but unfortunately I'm tied to the Amazon Web Services (AWS) ecosystem and I am maintaining Airflow myself (not provided through a service).

I see on the airflow webserver logs that when I traverse the Airflow Web UI it's sending rest calls

161.179.215.170 - - [17/Sep/2018:16:39:26 -0400] "GET /admin/ HTTP/1.1" 200 71942 "http://1.2.3.4:8080/admin/airflow/graph?dag_id=ARL_OnDemand" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"

and when I log in I see it tells me the username (which is logged in the login function here https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/auth/backends/ldap_auth.py)

[2018-09-17 16:27:15,493] {ldap_auth.py:287} INFO - User foobaruser successfully authenticated
161.179.215.170 - - [17/Sep/2018:16:27:16 -0400] "POST /admin/airflow/login HTTP/1.1" 302 221 "http://1.2.3.4:8080/admin/airflow/login?next=%2Fadmin%2F" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"

So I'm wondering if there's a way for me to update the webserver logs so that every time it logs a GET or POST request it also logs the client who sent the request. This would satisfy my audit log needs because I would always know what user did what in Airflow on the UI.

Update:

In this article

https://wecode.wepay.com/posts/improving-airflow-ui-security

Apparently Airflow 1.10 has introduced a whole new Website Security architecture and they will be deprecating the original Flask UI in the future.

This piece I found interesting relevant to this post though is the part where she talks about action logging being passive instead of being preemptive, I wonder if that's related to Audit Logging?

During this time, several improvements were made on security, including adding an action logging feature and creating a hard-coded naive RBAC implementation. However, the action logging was passive rather than preemptive, and the native RBAC implementation still allowed read and write access to DAGs for all roles, so they didn’t address our security concerns.

WORKING SOLUTION:

Despite me saying I was on Airflow version 1.10 I was actually on Airflow version 1.9 :) On Airflow vesion 1.9 the Owner column on the Logs was always blank for me unless it said Airflow. But after upgrading to Airflow version 1.10 and connecting to my LDAP now I see my LDAP username (kbridenstine) logged under Owner every time I do a modifying command!

enter image description here

And for the icing on the cake Airflow is also logging when someone on the server runs an Airflow command (because you can modify Airflow via their CLI commands too). You can see this with the root and ec2-users I was using for Airflow on my ec2-instance server running Airflow.

Kyle Bridenstine
  • 6,055
  • 11
  • 62
  • 100

1 Answers1

4

I think the logs under AIRFLOW_WEB_SERVER_URL:PORT/admin/log/ should provide you with enough information i.e. if someone clear a dag using UI or cli as shown in the screenshot below.

Some of this metadata is retrieved from the MetaDB.

enter image description here

kaxil
  • 17,706
  • 2
  • 59
  • 78
  • My logs do not have the hostname in them like yours does in the column to the far right. As the for the second column from the right that says kaxil and anonymous that's just the Dag owner name which is usually Airflow by default; so that field there doesn't help. I'm wondering if your logs show the host name because you're using the Airflow authentication with a username/password created through Airflow whereas I'm using LDAP? Also it looks like you're running locally so I wonder if that's affecting things. – Kyle Bridenstine Sep 21 '18 at 12:27
  • I have not listed 'kaxil' as the DAG owner. And yes I am using it locally and not using LDAP – kaxil Sep 21 '18 at 12:31
  • Your answer is right! I'll explain why it took so long to verify in an update to my post :) thank you! I awarded you the 50 point bounty! – Kyle Bridenstine Sep 25 '18 at 19:58
  • @kaxil Unfortunately I can't see here in this view what OP also asked that who has edited an Airflow Variable and what was the key. Is it possible to show in this Audit Log list? – elaspog Oct 15 '21 at 14:12