1

I am trying to find a solution to move files from an S3 bucket to Snowflake internal stage (not table directly) with Airflow but it seems that the PUT command is not supported with current Snowflake operator.

I know there are other options like Snowpipe but I want to showcase Airflow's capabilities. COPY INTO is also an alternative solution but I want to load DDL statements from files, not run them manually in Snowflake.

This is the closest I could find but it uses COPY INTO table:

https://artemiorimando.com/2019/05/01/data-engineering-using-python-airflow/

Also : How to call snowsql client from python

Is there any way to move files from S3 bucket to Snowflake internal stage through Airflow+Python+Snowsql?

Thanks!

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
costinrio
  • 15
  • 5
  • 1
    This is an unusual request. Why are you trying to move files from S3 to internal stage without the intent to load them into Snowflake? – Mike Walton May 12 '20 at 18:42
  • 1
    @MikeWalton I do have that intent but I want to do it in 2 steps: 1. Move files to stage with PUT 2. COPY INTO I know things can be done easily but the key here is showcasing Airflow orchestration. – costinrio May 12 '20 at 18:56

1 Answers1

2

I recommend you execute the COPY INTO command from within Airflow to load the files directly from S3, instead. There isn't a great way to get files to internal stage from S3 without hopping the files to another machine (like the Airflow machine). You'd use SnowSQL to GET from S3 to local, and the PUT from local to S3. The only way to execute a PUT to Internal Stage is through SnowSQL.

Mike Walton
  • 6,595
  • 2
  • 11
  • 22
  • Thanks @MikeWalton, do you know also if it is possible to run a DDL that contains create statements ? Instead of manually copying the code in the webui for example. – costinrio May 12 '20 at 19:42
  • Yes, you can execute the CREATE TABLE statement before your COPY INTO statement from a python connector (through Airflow, in your case). – Mike Walton May 12 '20 at 21:46
  • Mike's answer is what I would do too. There is even a SnowflakeOperator made available by airflow which allows you to execute the `COPY INTO` command easily. – Simon D May 14 '20 at 09:16
  • Downloading the file into local disk would crash dags in production environment, since workers don't share the same disk. – Victor Mayrink Jul 14 '23 at 18:55
  • 1
    @VictorMayrink This post is fairly old. With the latest connectors from Snowflake, you could connect to both S3 and Snowflake (PUT) using Airflow. The command could then pull the data from S3 and PUT to Snowflake Internal Stage in a single command. This would bypass the need to actually store any data on local drives. – Mike Walton Jul 16 '23 at 15:41