2

I understand that by setting xcom_push=True in SimpleHttpOperator I can access the returned data from Xcom from How to access the response from Airflow SimpleHttpOperator GET request. But it is not very clear to me how I can do that. Is that by creating a PythonOperator with a callback and calling xcom_pull inside the callback? Some example code would be great to see.

What I am trying to do is to read stock price data from Google Finance as a CSV data and then insert them into a MySQL database just to learn about Airflow. Initially I thought I can just use two Operators:

SimpleHttpOperator >> MySqlOperator

But now I guess I have to add PythonOperator in the middle?:

SimpleHttpOperator >> PythonOperator >> MySqlOperator

In PythonOperator do I need to pass provide_context set to True to access xcom value?

kee
  • 10,969
  • 24
  • 107
  • 168

1 Answers1

4

You’re close but I would use a HttpHook and a MySqlHook and glue them together in your own PythonOperator. Hooks are a bit lower level primitives.

I think of Operators as something that either perform an action (send an email, retrieve a status) or transfer data between A and B (where one of these are often a temporary/staging location). A Hook on the other hand is more like an open data source or destination.

A good Airflow introductory blog post covering exactly your scenario (and a bit more) is this one: http://tech.marksblogg.com/airflow-postgres-redis-forex.html

You would probably think that building your own PythonOperator involves a lot but as you see in the full DAG code in the blog above most of that operator is actually checking of the incoming data.

I read somewhere in the Airflow guide that Xcom isn’t really meant as the primary way to exchange the (potentially huge) data in your pipeline. It’s more for pushing parameters across tasks in a DAG.

jornh
  • 1,369
  • 13
  • 27
  • 1
    [XComs](https://airflow.apache.org/concepts.html#xcoms) let tasks exchange messages, allowing more nuanced forms of control and shared state. The name is an abbreviation of “cross-communication”. – palik Jun 27 '19 at 07:07