1

Correct me if i'm wrong, but my understanding of the UDF function in Snowpark is that you can send the function UDF from your IDE and it will be executed inside Snowflake. I have a staged database called GeoLite2-City.mmdb inside a S3 bucket on my Snowflake account and i would like to use it to retrieve informations about an ip address. So my strategy was to

1 Register an UDF which would return a response string n my IDE Pycharm 2 Create a main function which would simple question the database about the ip address and give me a response. The problem is that, how the UDF and my code can see the staged file at s3://path/GeoLite2-City.mmdb in my bucket, in my case i simply named it so assuming that it will eventually find it (with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:) since the stage_location='@AWS_CSV_STAGE' is the same as were the UDF will be saved? But i'm not sure if i understand correctly what the option stage_location is referring exactly.

At the moment i get the following error:

"Cannot add package geoip2 because Anaconda terms must be accepted by ORGADMIN to use Anaconda 3rd party packages. Please follow the instructions at https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#using-third-party-packages-from-anaconda."

Am i importing geoip2.database correctly in order to use it with snowpark and udf? Do i import it by writing session.add_packages('geoip2') ? Thank You for clearing my doubts. The instructions i'm following about geoip2 are here. https://geoip2.readthedocs.io/en/latest/

my code:

from snowflake.snowpark import Session

import geoip2.database

from snowflake.snowpark.functions import col

import logging

from snowflake.snowpark.types import IntegerType, StringType

 

logger = logging.getLogger()

logger.setLevel(logging.INFO)

 

session = None

user = ''*********'

password = '*********'

account = '*********'

warehouse = '*********'

database = '*********'

schema = '*********'

role = '*********'

 

print("Connecting")

cnn_params = {

"account": account,

"user": user,

"password": password,

"warehouse": warehouse,

"database": database,

"schema": schema,

"role": role,

}

 

def first_udf():

with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:

response = reader.city('203.0.113.0')

print('response.country.iso_code')

return response

 

try:

print('session..')

session = Session.builder.configs(cnn_params).create()

session.add_packages('geoip2')

 

session.udf.register(

func=first_udf

, return_type=StringType()

, input_types=[StringType()]

, is_permanent=True

, name='SNOWPARK_FIRST_UDF'

, replace=True

, stage_location='@AWS_CSV_STAGE'

)

 

session.sql('SELECT SNOWPARK_FIRST_UDF').show()

 

except Exception as e:

print(e)

finally:

if session:

session.close()

print('connection closed..')

print('done.')

UPDATE I'm trying to solve it using a java udf as in my staging area i have the 'geoip2-2.8.0.jar' library staged already. If i could import it's methods to get the country of an ip it would be perfect, the problem is that i don't know how to do it exactly. I'm trying to follow these instructions https://maxmind.github.io/GeoIP2-java/. I wanna interrogate the database and get as output the iso code of the country and i want to do it on snowflake worksheet.

CREATE OR REPLACE FUNCTION GEO()
  returns varchar not null
  language java
  imports = ('@AWS_CSV_STAGE/lib/geoip2-2.8.0.jar', '@AWS_CSV_STAGE/geodata/GeoLite2-City.mmdb')
  handler = 'test'
as
$$
def test():
  File database = new File("geodata/GeoLite2-City.mmdb")
  DatabaseReader reader = new DatabaseReader.Builder(database).build();
  InetAddress ipAddress = InetAddress.getByName("128.101.101.101");
  CityResponse response = reader.city(ipAddress);
  Country country = response.getCountry();
  System.out.println(country.getIsoCode());
$$;

SELECT GEO();
Kaido
  • 117
  • 9
  • Hi - what is it about the error you are getting that is causing confusion? The error message is clear about the cause and solution and this is also covered by the documentation – NickW Dec 02 '22 at 00:45
  • I updated my question. I'm trying to solve it with java – Kaido Dec 03 '22 at 15:07

1 Answers1

0

This will be more complicated that it looks:

  • To use session.add_packages('geoip2') in Snowflake you need to accept the Anaconda terms. This is easy if you can ask your account admin.
  • But then you can only get the packages that Anaconda has added to Snowflake in this way. The list is https://repo.anaconda.com/pkgs/snowflake/, and I don't see geoip2 there yet.
  • So you will need to package you own Python code (until Anaconda sees enough requests for geoip2 in the wishlist). I described the process here https://medium.com/snowflake/generating-all-the-holidays-in-sql-with-a-python-udtf-4397f190252b.
  • But wait! GeoIP2 is not pure Python, so you will need to wait until Anaconda packages the C extension libmaxminddb. But this will be harder, as you can see their docs don't offer a straightforward way like other pip installable C libraries.

So this will be complicated.

There are other alternative paths, like a commercial provider of this functionality (like I describe here https://medium.com/snowflake/new-in-snowflake-marketplace-monetization-315aa90b86c).

There other approaches to get this done without using a paid dataset, but I haven't written about that yet - but someone else might before I get to do it.

Btw, years ago I wrote something like this for BigQuery (https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds), but today I was notified that Google recently deleted the tables that I had shared with the world (https://twitter.com/matthew_hensley/status/1598386009129058315).

So it's time to rebuild in Snowflake. But who (me?) and when is still a question.

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • I have maxmind-db-1.2.1.jar staged, can i create an udf that use that jar file and the database in my staging area by following this instructions? https://dev.maxmind.com/geoip/geolocate-an-ip/databases?lang=en#1-install-the-geoip2-client-library – Kaido Dec 02 '22 at 22:59
  • Interesting. So if you have .jar, then you are using Java, then a Python UDF would not be the right language for your UDF, and you wouldn't need Anaconda, nor get an error about it. – Felipe Hoffa Dec 03 '22 at 00:34
  • In the beginning i was trying to solve it with Python, but now i'm trying to solve this with Java, since i have also maxmind-db-1.2.1.jar staged with the methods to do this, but the problem is that i don't know how to call the methods inside of it within my udf. I'm updating my question. – Kaido Dec 03 '22 at 14:16
  • Instead of updating the question, ask a new one. The question as asked has been answered, and we can start working on a new one – Felipe Hoffa Dec 03 '22 at 19:03
  • I created another question https://stackoverflow.com/questions/74669931/geoip2-snowflake-java-udf-integration-issue – Kaido Dec 03 '22 at 19:22