0

I'm using Apache Airflow to build a pipeline that generate reports for each week of some project, according to its data tables.

My second task get proper data to generate a chart regarding last four weeks activity. However, for the first run, whose date is equals to my dag start_start, I get the following error, using BigQueryOperator:

Error: FROM clause with table wildcards matches no table

Since, I'm trying to get a table that does not exist, using Jinja2 templating.

My code:

t2 = BigQueryOperator(
    task_id='create_agg_last_four_wks_prof',
    bql=None,
    bql="""
    #legacySql
    SELECT
        date_wk,
        mun,
        names,
        count(unique(if(plans>=1, cpf, 0))) as plans,
        count(unique(cpf)) as num_prof
    FROM
        TABLE_DATE_RANGE([kos.agg_wk_prof_],
            TIMESTAMP('{{ macros.ds_add(ds, -28) }}'),
            TIMESTAMP('{{ macros.ds_add(ds, -7) }}'))
    WHERE
        project = true
    GROUP BY
        date_wk, mun, names
    """,
    destination_dataset_table="xcfz-edu:kos.test_agg_last_four_wks_prof_{{ ds_nodash }}",
    write_disposition="WRITE_TRUNCATE",
    dag=dag

So, I want know how to write a empty table if the from clause matches no results. It would be a workaround solution for my issue.

I know that I can create a conditional task, but I'm seeking for a solution that does not change "what is already working". I mean, that does not change affect other tasks.

Is it even possible using SQL or/and Jinja2 statements?

Kfcaio
  • 442
  • 1
  • 8
  • 20
  • I think you are just missing `macro.ds_format` because as far as I know Google BigQuery expects `YYYYMMDD` and not `YYYY-MM-DD` which `ds_add` returns. – tobi6 Aug 04 '18 at 10:06

1 Answers1

1

You can do this using standard SQL in BigQuery:

SELECT
    date_wk,
    mun,
    names,
    count(DISTINCT if(plans>=1, cpf, 0)) as plans,
    count(DISTINCT cpf) as num_prof
FROM
    `pdc.agg_wk_prof_*`
WHERE
    _TABLE_SUFFIX BETWEEN
      FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 28 DAY)) AND
      FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
    project = true
GROUP BY
    date_wk, mun, names
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99