4

I am using Snowpark for Python. I want to import imblearn package but when I check pre-installed packages at https://repo.anaconda.com/pkgs/snowflake/ this package is not installed in the Snowpark anaconda environment. How can use this package on snowpark?

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
Emre Becit
  • 41
  • 1
  • 2

4 Answers4

4

A number of open source third-party Python packages that are built and provided by Anaconda are made available to use out of the box inside Snowflake.

Snowflake is constantly adding new packages. But if you don't find a specific package then

  • First check if the package has only native python code(pure python package), if so, then install the package to your local, zip it and put it to the snowflake stage and add this stage path in the imports parameter or add_import() method. This should work.

  • If not, all one can do is wait for it to be available.

Also in snowflake you can use this query to get the details about the packages:
select * from information_schema.packages where language = 'python';

PooMac
  • 41
  • 2
  • I'm also trying to do this but I'm getting a `ModuleNotFoundError` for an underlying module. Specifically, I'm tiring to port over https://github.com/Mimino666/langdetect. Maybe you could assist please. – Maxim Feb 07 '23 at 01:36
  • Does it mean that for user-defined packages (say in a pypi private repo) we would need to zip it like you said ? – linSESH Feb 17 '23 at 16:34
  • 1
    @linSESH yes, zip them and put them in stage and then reference them in your SP or UD(T)Fs. – PooMac Feb 20 '23 at 12:04
  • @Maxim did you resolve this? I'm getting the same error message on my end. – Vid Stropnik Mar 22 '23 at 09:05
  • @VidStropnik yes and no. The issue was that I packaged the top project level folder of the langdetect package, but should have just zipped the underlying module src directory. That said, I still ended up encountering package specific issues related to zipimport limitations. Hope that was helpful though – Maxim Mar 23 '23 at 13:42
1

We've done the same, but for shap.

You need to install the package locally and zip up that package.

pip install -t shap shap
cd shap
zip -r shap.zip shap/

Then copy that zip file to an S3 location, for which you have a Snowflake stage. In the example below, our stage is PYTHON_PACKAGES.

Then, use add_import to direct Snowpark to use it...

from snowflake.snowpark import Session
from snowflake.snowpark.types import StringType

session = Session.builder.configs(connection_parameters).create()
session.add_import("@PYTHON_PACKAGES/shap.zip")
session.add_packages("numpy", "pandas==1.3.5","scipy","scikit-learn", "numba", "slicer", "tqdm")

def test():
    import shap
    return shap.__version__

test_udf = session.udf.register(
    name="TEST",
    func=test,
    replace=True,
    return_type=StringType()
)
session.sql("select test()").show()
TigerHive
  • 56
  • 3
  • I did this, but am getting a 'ModuleNotFoundError' error. Are you explicitly importing shap anywhere or directing Snowpark to use it in any way, other than just importing it? – Vid Stropnik Mar 22 '23 at 09:04
  • Hi, @VidStropnik. My apologies. Just re-tested this and did, indeed, need to add an import. Not sure how I missed that off. We also subsequently discovered that `shap` _is_ included in Snowpark, so didn't need to use this method. I believe it is a valid method, though. – TigerHive Mar 28 '23 at 11:14
0

If the package you want to use only have native Python code then you might be able to use it.

Simplest way is to install the package into your local environment and then zip the installation directory and and add that zip using the IMPORTS parameter when using CREATE FUNCTION or the add_import() method if using Snowpark API.

  • Mats, do you have a link to some good example. I did install torch on my pc, zipped the torch folder, created stage and uploaded the zip to it. In snowpark, used add_imports("@stage_name/zip_file_name.zip"). I got an error when I tried to use the torch in my UDF. I suspect it is because torch depends on other package which I did not upload to stage, like torchvideo, torchaudio, etc. – psabela Aug 23 '22 at 01:14
  • Not sure if you mean PyTorch or another package. PyTorch is available in Snowflake today (version 1.8.1 and 1.10.2) so you do not need to add it just use the packages parameter or add_packages. If you want check which packages that is available you can run the following SQL select * from information_schema.packages where language = 'python' There is a nice article at https://medium.com/snowflake/deploying-custom-python-packages-from-github-to-snowflake-f0bb396480c7 on how to add additional packages – Mats Stellwall Aug 24 '22 at 06:05
0

I did this using SnowSQL, and a worksheet within Snowflake