1

Have a few questions regarding SnowPark with Python.

  1. Why do we need Snowpark when we already have Snowflake python connector(freely) that can use to connect to Python jupyter with Snowflake DW?

  2. If we use snowpark and connect with Local jupyter file to run ML model. Is it use our local machine computing power or Snowflake computing power?If its our local machine computing power how can we use Snowflake computing power to run the ml model?

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
johnson
  • 379
  • 2
  • 17

5 Answers5

6
  1. Snowpark with Python allows you to treat a Snowflake table like a Spark DF. This means you can run pyspark code against Snowflake tables without the need to pull the data out of Snowflake, and the compute is Snowflake compute, not your local machine, which is fully elastic.
  2. As long as you are executing spark dataframe logic in python, the compute will be on the Snowflake side. If you pull that data back to your machine to execute other logic (pandas, for example), then Snowpark will be pulling the data back to your local machine and the compute will happen there as normal.

I recommend starting here to learn more:

https://docs.snowflake.com/en/developer-guide/snowpark/index.html

Mike Walton
  • 6,595
  • 2
  • 11
  • 22
5

A couple of things to have in mind is that we are talking about multiple things here and it could be good with some clarification.

Snowpark is a library that you install through pip/conda and it's a dataframe library, meaning you will be able to define a dataframe object that points to data in Snowflake (there is also ways to get data into Snowflake using it as well). It does not pull back the data to the client, unless you explicit tells it too, and all computation is done on the Snowflake side.

When you do operations on a Snowpark dataframe you are using Python code that will generate SQL that is executed in Snowflake, using the same mechanism as if you wrote your own SQL. The execution of the generated SQL is triggered by action methods such as .show(), .collect(), save_as_table() and so on.

More information here

As part of the Snowflake Python support there is also Python UDFs and Python Stored Procedures, you do not need Snowpark to create or use those since you can do that with SQL using CREATE FUNCTION/CREATE STORED PROCEDURE, but you can use Snowpark as well.

With Python UDFs and Python Stored Procedures you can bring Python code into Snowflake that will be executed on the Snowflake compute, it will not be translated into SQL but will use Python sandboxes that run on the compute nodes.

In order to use Python Stored Procedures or Python UDFs you do not have to do anything, it is there like any other built in feature of Snowflake.

More information about Python UDFs and information about Python Stored Procedures.

The Snowflake Python Connector allows you to write SQL that is executed on Snowflake and the the result is pulled back to the client to be used there, using the client memory etc. If you want your manipulation to be executed in Snowflake you need to write SQL for it.

  • Thanks for your comment - I was trying to wrap my head around Snowpark as well because it seemed to me that all the individual pieces have already existed. Kudos to the company for wrapping it all up in a convenient package. – Leo Romanovsky Nov 06 '22 at 04:30
3

Using the existing Snowflake Python Connector you bring the Snowflake data to the system that is executing the Python program, limiting you to the compute and memory of that system. With Snowpark for Python, you are bringing your Python code to Snowflake to leverage the compute and memory of the cloud platform.

Dave Welden
  • 1,122
  • 6
  • 6
2

Snowpark python provides the following benefits which are not there with the Snowflake python connector

  1. User can bring their custom python client code into Snowflake in the form of a UDF (user defined function) and use these functions on Dataframe.

It allows data engineers, data scientists and data developers to code in their familiar way with their language of choice, and execute pipeline, ML workflow and data apps faster and more securely, in a single platform.

  1. User can build/work with queries using the familiar syntax of Dataframe APIs ( Dataframe style of programming)

  2. User can use all popular Anaconda's libraries, all these libraries are pre-installed. User has access to hundreds of curated, open-source Python packages from Anaconda's libraries.

  3. Snowpark operations are executed lazily on the server, which reduces the amount of data transferred between your client and the Snowflake database.

For more details, please refer to the documentation

0

I think that understanding Snowpark is complex. I think @Mats answer is really good. I created blog post that I think provides some high level guidance: https://www.mobilize.net/blog/lost-in-the-snowpark

orellabac
  • 2,077
  • 2
  • 26
  • 34