96

I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI.

How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that?

I looked around but could not find an answer for a simple way of deleting a DAG once it has been loaded and scheduled.

subba
  • 1,625
  • 2
  • 16
  • 18
  • There is no CLI for this. But there is a pull request that was abandoned if you wanted to try and revive it: https://github.com/apache/incubator-airflow/pull/1344 – TheF1rstPancake Nov 18 '16 at 04:06
  • 1
    In Airflow versions < 1.10 , its a two step process: 1. Remove the Dag from /airflow/dags/ folder This will remove the dag from airflow list_dags command. But it will still be visible on GUI with a message that since its state is active, it is shown on Airflow GUI. In order to remove follow the step below: 2) Go to mysql instance of airflow cluster and look for database name "airflow".In that search for table name "dag". Run the describe command, it will show a field name as "is_active" set to 1 . Run mysql update command and set it to 0 . Now refresh GUI and the dag is not there. – codninja0908 Dec 03 '19 at 08:05

19 Answers19

78

Edit 8/27/18 - Airflow 1.10 is now released on PyPI!

https://pypi.org/project/apache-airflow/1.10.0/


How to delete a DAG completely

We have this feature now in Airflow ≥ 1.10!

The PR #2199 (Jira: AIRFLOW-1002) adding DAG removal to Airflow has now been merged which allows fully deleting a DAG's entries from all of the related tables.

The core delete_dag(...) code is now part of the experimental API, and there are entrypoints available via the CLI and also via the REST API.

CLI:

airflow delete_dag my_dag_id

REST API (running webserver locally):

curl -X "DELETE" http://127.0.0.1:8080/api/experimental/dags/my_dag_id

Warning regarding the REST API: Ensure that your Airflow cluster uses authentication in production.

Installing / upgrading to Airflow 1.10 (current)

To upgrade, run either:

export SLUGIFY_USES_TEXT_UNIDECODE=yes

or:

export AIRFLOW_GPL_UNIDECODE=yes

Then:

pip install -U apache-airflow

Remember to check UPDATING.md first for the full details!

Taylor D. Edmiston
  • 12,088
  • 6
  • 56
  • 76
  • @jack Good question – I just added the command to the answer. The code was added after 1.9 so it won't be available there. – Taylor D. Edmiston Jul 19 '18 at 19:16
  • I think the curl command is: `curl -X "DELETE" http://127.0.0.1:8080/api/experimental/dags/my_dag_id` – Mike Jul 30 '18 at 20:45
  • 1
    @Mike Good catch. Just fixed it. Thank you! – Taylor D. Edmiston Jul 30 '18 at 20:53
  • 2
    Airflow 1.10.1 now has added the ability to delete a DAG from the web UI – Alex Nov 28 '18 at 09:53
  • 2
    This gives me `airflow.exceptions.DagFileExists: Dag id example_bash_operator is still in DagBag. Remove the DAG file first`. – akki May 09 '19 at 08:41
  • 3
    @akki Deleting a DAG via the API or UI only removes the DAG's history from the database tables, not the DAG file itself, so it's better to delete your DAG's .py file first if your goal is to not have the DAG run again. – Taylor D. Edmiston May 09 '19 at 14:32
  • @TaylorEdmiston Ohh, thanks for the comment. For me, that was neither intuitive nor could I find it anywhere in the documentation. On top of that, just by installing Airflow I got 16 example DAGs (so I got a cluttered UI instead of a fresh one), for which I had no idea where their DAG files are! – akki May 10 '19 at 15:23
  • 1
    @akki Yeah, the example DAGs are weird because they're built-in. In your `airflow.cfg` config file under the `[core]` group, I would recommend setting `load_examples` to `False` for a production instance. That's the equivalent of removing their DAG files for the examples. [more info](https://github.com/apache/airflow/blob/220c1621702a61b661b1a16c45944812330ca2cf/airflow/config_templates/default_airflow.cfg#L142-L145) – Taylor D. Edmiston May 10 '19 at 21:13
  • @TaylorEdmiston Do you know any way of setting this config through environment variables. Actually, I have prepared a Docker file for running Airflow in production and was wondering if I can set this config in the Docker file or through Docker-compose. Thanks a lot again for all the help till now. – akki May 12 '19 at 04:31
  • 1
    @akki Yes, you can definitely `ADD` your airflow.cfg to your Dockerfile in your `$AIRFLOW_HOME` directory and it should just work. You can also set it via env var by setting `AIRFLOW__CORE__LOAD_EXAMPLES=False` (note the double underscores). Some people find the env var approach simpler for just changing a few settings but just make sure that you set them consistently across your scheduler, webserver, and worker nodes. More info: http://incubator-airflow.readthedocs.io/en/latest/configuration.html. This should work in Compose too - https://docs.docker.com/compose/environment-variables/. – Taylor D. Edmiston May 12 '19 at 13:58
  • @TaylorEdmiston Thanks a lot :o - very kind of you. :) – akki May 13 '19 at 07:35
  • FYI, I discovered a weird corner case related to this. If you have a file like `mydag.py` which also has an error in the file (one that would show in the UI), and you then rename it to `mydag_dag.py` to conform to the general standard of having `_dag.py` in the name, the error message associated with the original file will continue to show up in the UI, even across restarts (at least in 1.10.15). The solution is to delete the NEW dag through the UI button without actually deleting its file. Once it gets reloaded (because the file is still there) the error message will go away. – Stephen Sep 01 '21 at 23:25
22

This is my adapted code using PostgresHook with the default connection_id.

import sys
from airflow.hooks.postgres_hook import PostgresHook

dag_input = sys.argv[1]
hook=PostgresHook( postgres_conn_id= "airflow_db")

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
    sql="delete from {} where dag_id='{}'".format(t, dag_input)
    hook.run(sql, True)
19

DAG-s can be deleted in Airflow 1.10 but the process and sequence of actions must be right. There's an "egg and chicken problem" - if you delete DAG from frontend while the file is still there the DAG is reloaded (because the file is not deleted). If you delete the file first and refresh the page then DAG cannot be deleted from web gui any more. So the sequence of actions that let me delete a DAG from frontend was:

  1. delete the DAG file (in my case delete from pipeline repository and deploy to airflow servers, esp the scheduler)
  2. DO NOT refresh web GUI.
  3. In the web GUI in the DAGs view (normal frontpage) click on "Delete dag" -> enter image description here the red icon on the far right.
  4. It cleans up all the remains of this DAG from the database.
Sven
  • 856
  • 10
  • 10
14

Not sure why Apache Airflow doesn't have an obvious and easy way to delete a DAG

Filed https://issues.apache.org/jira/browse/AIRFLOW-1002

Tagar
  • 13,911
  • 6
  • 95
  • 110
  • 5
    The PR for this is open but hasn't yet been merged. The link for those interested - https://github.com/apache/incubator-airflow/pull/2199. – Taylor D. Edmiston Aug 15 '17 at 03:07
12

I just wrote a script that deletes everything related to a particular dag, but this is only for MySQL. You can write a different connector method if you are using PostgreSQL. Originally the commands where posted by Lance on https://groups.google.com/forum/#!topic/airbnb_airflow/GVsNsUxPRC0 I just put it in script. Hope this helps. Format: python script.py dag_id

import sys
import MySQLdb

dag_input = sys.argv[1]

query = {'delete from xcom where dag_id = "' + dag_input + '"',
        'delete from task_instance where dag_id = "' + dag_input + '"',
        'delete from sla_miss where dag_id = "' + dag_input + '"',
        'delete from log where dag_id = "' + dag_input + '"',
        'delete from job where dag_id = "' + dag_input + '"',
        'delete from dag_run where dag_id = "' + dag_input + '"',
        'delete from dag where dag_id = "' + dag_input + '"' }

def connect(query):
        db = MySQLdb.connect(host="hostname", user="username", passwd="password", db="database")
        cur = db.cursor()
        cur.execute(query)
        db.commit()
        db.close()
        return

for value in query:
        print value
        connect(value)
Oleg Yamin
  • 1,434
  • 1
  • 12
  • 13
  • I noticed there is a pickle_id in the `dag` table. Should we maybe also do `delete from dag_pickle where id = (select pickle_id from public.dag where dag_id = 'my_dag_id')` before we delete from the `dag` table? – André C. Andersen Jan 05 '19 at 12:36
8

Airflow 1.10.1 has been released. This release adds the ability to delete a DAG from the web UI after you have deleted the corresponding DAG from the file system.

See this ticket for more details:

[AIRFLOW-2657] Add ability to delete DAG from web ui

Airflow Links menu with delete icon

Please note that this doesn't actually delete the DAG from the file system, you will need to do this manually first otherwise the DAG will get reloaded.

Alex
  • 21,273
  • 10
  • 61
  • 73
  • It works if you have deleted the actual DAG file. If the DAG is still there, it will be reloaded – Alex Jan 29 '19 at 21:10
  • 2
    This gives me `Dag id example_bash_operator is still in DagBag. Remove the DAG file first.`. – akki May 09 '19 at 08:41
  • 1
    You need to remove the Dag file from the file system first. – Alex May 10 '19 at 10:07
  • @Jaco 's comment is helpful. The error `Dag id example_bash_operator is still in DagBag. Remove the DAG file first.` disappears after removing DAG .py file from dags directory. – Tomáš Záluský Jun 07 '19 at 23:33
  • It is a very useful feature! However I wanted to delete a DAG in order to just remove the history and re-add it immediately. Airflow did not accept a DAG with the same filename. I had to change the filename of the DAG and then Airflow recognized it as a new DAG (of the same name and same parameters). – peschü Jan 14 '20 at 16:16
  • Just for testing, I did this to all my functional DAGS. Now, even with literally no change to any files, the DAGS will not appear again on the UI - why is this and how can I fix it? – nate Aug 17 '20 at 04:45
  • Airflow is configured by default to refresh DAGs unless you have disabled this in airflow.cfg: https://stackoverflow.com/questions/51558313/what-is-the-difference-between-min-file-process-interval-and-dag-dir-list-interv – Alex Aug 24 '20 at 00:26
6

I've written a script that deletes all metadata related to a specific dag for the default SQLite DB. This is based on Jesus's answer above but adapted from Postgres to SQLite. Users should set ../airflow.db to wherever script.py is stored relative to the default airflow.db file (usually ~/airflow). To execute, use python script.py dag_id.

import sqlite3
import sys

conn = sqlite3.connect('../airflow.db')
c = conn.cursor()

dag_input = sys.argv[1]

for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
    query = "delete from {} where dag_id='{}'".format(t, dag_input)
    c.execute(query)

conn.commit()
conn.close()
jeff
  • 361
  • 1
  • 5
  • 7
3

For those who have direct access to the Postgres psql console of the airflow db, you can simply execute the following request to remove the DAG:

\set dag_id YOUR_DAG_ID

delete from xcom where dag_id=:'dag_id';
delete from task_instance where dag_id=:'dag_id';
delete from sla_miss where dag_id=:'dag_id';
delete from log where dag_id=:'dag_id';
delete from job where dag_id=:'dag_id';
delete from dag_run where dag_id=:'dag_id';
delete from dag where dag_id=:'dag_id';

A similar (with minor changes) query is suitable for other databases, such as MySQL and SQLite.

lucidyan
  • 3,575
  • 2
  • 22
  • 24
1

There is nothing inbuilt in Airflow that does that for you. In order to delete the DAG, delete it from the repository and delete the database entries in the Airflow metastore table - dag.

kvb
  • 625
  • 3
  • 8
  • 12
  • I also had to reboot the machine on which the schedule and webserver are running to finish the cleanup. Simply restarting the webserver and scheduler were insufficient. – Jean-Christophe Rodrigue Nov 02 '17 at 23:21
0

versions >= 1.10.0:

I have airflow version 1.10.2 and I tried executing airflow delete_dag command but the command throws following error:

bash-4.2# airflow delete_dag dag_id

[2019-03-16 15:37:20,804] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=28224 /usr/lib64/python2.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) This will drop all existing records related to the specified DAG. Proceed? (y/n)y Traceback (most recent call last): File "/usr/bin/airflow", line 32, in args.func(args) File "/usr/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 258, in delete_dag raise AirflowException(err) airflow.exceptions.AirflowException: Server error

Though I am able to delete through Curl command. Please let me know if anyone have idea about this command's execution, is this known or I am doing something wrong.

versions <= 1.9.0:

There is not a command to delete a dag, so you need to first delete the dag file, and then delete all the references to the dag_id from the airflow metadata database.

WARNING

You can reset the airflow meta database, you will erase everything, including the dags, but remember that you will also erase the history, pools, variables, etc.

airflow resetdb and then airflow initdb

  • 1
    Yeah but you should let people know running `airflow resetdb` will erase everything in the database including any `pools`, `variables`, or even login session cookie data (meaning anyone with a logged in session cookie on their browser would get a `Server Error` when they refreshed the page` and they would need to clear their cookies/cache or use Chrome's Incognitto mode in order to log back in (something that is NOT good in a production environment because it makes the users think your Airflow went down...)). – Kyle Bridenstine Aug 17 '18 at 19:30
  • Also you have to run `airflow initdb` after running `airflow resetdb`. – Kyle Bridenstine Aug 17 '18 at 19:31
0

Based on the answer of @OlegYamin, I'm doing the following to delete a dag backed by postgres, where airflow uses the public schema.

delete from public.dag_pickle where id = (
    select pickle_id from public.dag where dag_id = 'my_dag_id'
);
delete from public.dag_run where dag_id = 'my_dag_id';
delete from public.dag_stats where dag_id = 'my_dag_id';
delete from public.log where dag_id = 'my_dag_id';
delete from public.sla_miss where dag_id = 'my_dag_id';
delete from public.task_fail where dag_id = 'my_dag_id';
delete from public.task_instance where dag_id = 'my_dag_id';
delete from public.xcom where dag_id = 'my_dag_id';
delete from public.dag where dag_id = 'my_dag_id';

WARNING: The effect/correctness of the first delete query is unknown to me. It is just an assumption that it is needed.

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79
0

just delete it from mysql, works fine for me. delete them from below tables:

  • dag

  • dag_constructor

  • dag_group_ship
  • dag_pickle
  • dag_run
  • dag_stats

(might be more tables in future release) then restart webserver and worker.

Angel F Syrus
  • 1,984
  • 8
  • 23
  • 43
luckyfox
  • 11
  • 1
0

In the new airflow version there is a delete dag (red x) button in the UI , next to the DAGs

enter image description here

maleckicoa
  • 481
  • 5
  • 8
  • You could be more specific, what is the new airflow version? Deleting the dag in this way does not delete the file, it should be done first. – 4ndt3s Jan 17 '21 at 06:48
  • My version was 1.8 if I remember correctly. Obviously you need to physically delete the DAG file from the dags folder. That's is not the issue here, please read the question more carefully: "How can I delete a particular DAG from being run and shown in web GUI? ..." The question here was how to delete the DAG from GUI because it was cached there. To achieve that you can delete it as I explained above. – maleckicoa Apr 20 '21 at 08:47
0

If you're using Docker to run Airflow, you could use the BashOperator within a DAG to delete another DAG:

t1 = BashOperator(task_id='delete_dag_task', bash_command=f'airflow dags delete -y {dag_id}')

where dag_id is the name of the dag. This uses the standard CLI command instead of deleting records from the metadatabase yourself. You also need to delete the DAG file from the dags directory using a PythonOperator.

I have such a DAG that do this:

from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash import BashOperator
import os

# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
    'start_date': days_ago(1),
    'owner': 'airflow',
    'retries': 1
}


def delete_dag(**context):
    conf = context["dag_run"].conf
    dag_id = conf["dag_name"]
    t1 = BashOperator(task_id='delete_dag_task', bash_command=f'airflow dags delete -y {dag_id}')
    t1.execute(context=context)


def delete_dag_file(**context):
    conf = context["dag_run"].conf
    dag_id = conf["dag_name"]
    script_dir = os.path.dirname(__file__)
    dag_file_path = os.path.join(script_dir, '{}.py'.format(dag_id))
    try:
        os.remove(dag_file_path)
    except OSError:
        pass


with DAG('dag-deleter',
         schedule_interval=None,
         default_args=default_args,
         is_paused_upon_creation=False,
         catchup=False) as dag:

    delete_dag = PythonOperator(
        task_id="delete_dag",
        python_callable=delete_dag,
        provide_context=True)

    delete_dag_file = PythonOperator(
        task_id="delete_dag_file",
        python_callable=delete_dag_file,
        provide_context=True
    )

    delete_dag >> delete_dag_file

and I trigger the DAG using the REST API, passing the following payload in the http request:

{"conf": {"dag_name": "my_dag_name"} }
jignatius
  • 6,304
  • 2
  • 15
  • 30
0

Using the CLI is sometimes needed when deleting the DAG takes a while and the web interface reaches a timeout faster.

Recent versions of Airflow have changed the format of the CLI. The command to delete a DAG is:

airflow dags delete my_dag

Reference documentation on the Airflow website: https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#delete_repeat1

Example:

/opt/airflow$ airflow dags delete my_dag
[2022-10-12 12:21:40,657] {__init__.py:38} INFO - Loaded API auth backend: <module 'airflow.api.auth.backend.deny_all' from '/home/airflow/.local/lib/python3.8/site-packages/airflow/api/auth/backend/deny_all.py'>
This will drop all existing records related to the specified DAG. Proceed? (y/n)y
[2022-10-12 12:21:48,775] {delete_dag.py:43} INFO - Deleting DAG: my_dag
Removed 135 record(s)
ZeWaren
  • 3,978
  • 2
  • 20
  • 21
-1

You can clear a set of task instance, as if they never ran with:

airflow clear dag_id -s 2017-1-23 -e 2017-8-31

And then remove dag file from dags folder

David Lexa
  • 189
  • 2
  • 13
-1

First --> Delete the DAG file from $AIRFLOW_HOME/dags folder. Note: Depending on whether you have used subdirectories, you may have to dig through the subdirectories to find the DAG file and delete it.

Second --> Delete the DAG from the Webserver UI using the delete button (x in circle)

-2

Remove the dag(you want to delete) from the dags folder and run airflow resetdb.

Alternatively, you can go into the airflow_db and manually delete those entries from the dag tables(task_fail, xcom, task_instance, sla_miss, log, job, dag_run, dag, dag_stats).

Ayush Chauhan
  • 441
  • 7
  • 25
  • 3
    Yeah but you should let people know running `airflow resetdb` will erase everything in the database including any `pools`, `variables`, or even login session cookie data (meaning anyone with a logged in session cookie on their browser would get a Server Error when they refreshed the page` and they would need to clear their cookies/cache or use Chrome's Incognitto mode in order to log back in (something that is NOT good in a production environment because it makes the users think your Airflow went down...)). Also you have to run `airflow initdb` after running `airflow resetdb`. – Kyle Bridenstine Aug 17 '18 at 19:32
  • Not recommended. For updating the database data prefer `upgradedb`. – Sebastián Palma May 25 '19 at 02:05
-6

For those who are still finding answers. On Airflow version 1.8, its very difficult to delete a DAG, you can refer to answers above. But since 1.9 has been released, you just have to

remove the dag on the dags folder and restart webserver

SMDC
  • 709
  • 1
  • 9
  • 17
  • 5
    Note that `resetdb` will burn down and rebuild the entire metadata database. It's not possible to reset one DAG this way. https://airflow.apache.org/cli.html#resetdb – Taylor D. Edmiston Apr 06 '18 at 00:02