1

I'm planning to deploy airflow dags on cloud storage and provide connection to GCS and access those dags from Airflow running on Google Compute Engine instance.

From the documentation it is very clear that remote logging is possible.

I wanted to keep storage (Logs and Dags) part of the Airflow completely on GCS as I have to manage these files to avoid modifying the files on the VM once deployed.

Also, As logs increase, it starts holding on to a lot of space on the cloud VM's disk.

Is it possible to store dags on GCS. If so, how can I achieve this.

Thanks in advance.

Tameem
  • 408
  • 7
  • 19

4 Answers4

1

I'm late to the party, but you can mount a bucket as a file system to your VM (or any Linux system).

It can be somewhat slow in my experience, compared to actual file systems, but if I understand you correctly, this should work for you.

Details for getting this working can be found in the documentation.

Edo Akse
  • 4,051
  • 2
  • 10
  • 21
  • Although this works, the sync is really slow... This is introducing a huge delay in the modules. Nevertheless this is possible. – Tameem Sep 07 '18 at 18:30
0

Google Cloud Platform seems to be extensively integrated with airflow for data processing and storage.

There is an official Google Cloud Blog documentation article which explains how to connect Airflow with BigQuery. Additionally, there is also an additional section for Google Cloud Platform integration in the official Airflow documentation , which may explain additional details for the complete integration.

As summary, BigQuery seems to be the adequate product for you it is an specialized Google tool that manages large volumes of databases and makes them easy to manipulate and operate with external tools and from inside other google products (as VMs are).

Ggrimaldo
  • 327
  • 1
  • 7
  • 2
    This does not answer the question of if you can store dags on GCS, you've explained how to airflow connects to GCP – Simon D Mar 13 '18 at 14:23
  • I have been using Airflow to schedule operations on Google Cloud Platform. Just as @SimonD said, you did not explain what I asked. – Tameem Mar 17 '18 at 04:35
  • My pardon, I will rectify with another answer – Ggrimaldo Mar 27 '18 at 09:33
0

One way to achieve saving DAGs in GCS would be storing them as a JSON in the bucket. Like that, you could avoid storing the files in the VM.

An example showing how can you do this is in this is this other Stackoverflow post

Ggrimaldo
  • 327
  • 1
  • 7
  • The link you shared didn't contain information on Airflow DAGs. Airflow DAGs are definitions of a Directed Acyclic Graph(DAG) that are programmed, complex and dynamic in nature. – Tameem Mar 27 '18 at 19:44
0

I know this is an old question, but for anyone interested you can now just use fully managed Airflow on GCP with Google Cloud Composer.

Alessandro
  • 609
  • 1
  • 4
  • 8