1

I have followed the steps in this notebook to install rasterframes on my databricks cluster.

Eventually I am able to import the following:

from pyrasterframes import rf_ipython
from pyrasterframes.utils import create_rf_spark_session
from pyspark.sql.functions import lit 
from pyrasterframes.rasterfunctions import *

But when I run:

spark = create_rf_spark_session()

I get the following error: "java.lang.NoClassDefFoundError: scala/Product$class".

I am using a cluster with Spark 3.2.1. I also installed Java Runtime Environment 1.8.0_341, but this made no difference.

Could someone explain what went wrong? And how to solve this error?

The full error log:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-2354681519525034> in <module>
      5 
      6 # Use the provided convenience function to create a basic local SparkContext
----> 7 spark = create_rf_spark_session()
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/utils.py in create_rf_spark_session(master, **kwargs)
     97 
     98     try:
---> 99         spark.withRasterFrames()
    100         return spark
    101     except TypeError as te:
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/__init__.py in _rf_init(spark_session)
     42     """ Adds RasterFrames functionality to PySpark session."""
     43     if not hasattr(spark_session, "rasterframes"):
---> 44         spark_session.rasterframes = RFContext(spark_session)
     45         spark_session.sparkContext._rf_context = spark_session.rasterframes
     46 
 
/databricks/python/lib/python3.8/site-packages/pyrasterframes/rf_context.py in __init__(self, spark_session)
     37         self._jvm = self._gateway.jvm
     38         jsess = self._spark_session._jsparkSession
---> 39         self._jrfctx = self._jvm.org.locationtech.rasterframes.py.PyRFContext(jsess)
     40 
     41     def list_to_seq(self, py_list):
 
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1566 
   1567         answer = self._gateway_client.send_command(command)
-> 1568         return_value = get_return_value(
   1569             answer, self._gateway_client, None, self._fqn)
   1570 
 
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    115     def deco(*a, **kw):
    116         try:
--> 117             return f(*a, **kw)
    118         except py4j.protocol.Py4JJavaError as e:
    119             converted = convert_exception(e.java_exception)
 
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)
 
Py4JJavaError: An error occurred while calling None.org.locationtech.rasterframes.py.PyRFContext.
: java.lang.NoClassDefFoundError: scala/Product$class
    at org.locationtech.rasterframes.model.TileDimensions.<init>(TileDimensions.scala:35)
    at org.locationtech.rasterframes.package$.<init>(rasterframes.scala:55)
    at org.locationtech.rasterframes.package$.<clinit>(rasterframes.scala)
    at org.locationtech.rasterframes.py.PyRFContext.<init>(PyRFContext.scala:49)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:250)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    ... 15 more

Many thanks in advance?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

2 Answers2

0

That version of RasterFrames (0.8.4) works only with DBR 6.x that uses Spark 2.4 & Scala 2.11, and will not work on Spark 3.2.x that uses Scala 2.12. You may try to use version 0.10.1 instead that was upgraded to Spark 3.1.2, but it may not work with Spark 3.2 (I haven't tested it).

If you're looking for execution of the geospatial queries on Databricks, you can look onto the Mosaic project from Databricks Labs - it supports standard st_ functions & many other things. You can find announcement in the following blog post, more information is in the talk at Data & AI Summit 2022, documentation & project on GitHub.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • I need to do zonal statistics (calculating statistics of raster cells within a polygon). As far as I am aware, this is not provided within mosaic yet? Do I conclude it will not be possible to use rasterframes with the current databricks setup I have? – Joost Neujens Jul 27 '22 at 16:22
  • It should be possible to use rasterframes - there is a version for Spark 3… so you may just need to use correct version – Alex Ott Jul 27 '22 at 17:06
  • My geospatial colleagues pointed that Apache Sedona has some support for raster as well – Alex Ott Jul 27 '22 at 17:18
  • And rasterframes 0.10.x should definetely work with DBR 7.3 with library built from this branch: https://github.com/mjohns-databricks/rasterframes/tree/0.10.0-databricks – Alex Ott Jul 27 '22 at 18:21
  • Thanks so much! I'll have a look at Sedona... I now have rasterframes 0.10.0 installed, and I get a slightly different error message: "java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.package$" – Joost Neujens Jul 27 '22 at 19:06
  • You need a full stack trace - unfortunately such connectors are relying on the spark internals, and not compatible between major releases – Alex Ott Jul 27 '22 at 19:16
0

I managed to get version 0.10.x of rasterframes working with Databricks runtime version 9.1 LTS. At the time of writing you cannot upgrade to a higher version of the runtime, because of pyspark version differences. Below you'll find a step-by-step guide on how to get this to work:

  • Cluster should be single user, otherwise you'll get this error:

    py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted
    
  • At the time of writing, the Databricks runtime version needs to be 9.1 LTS.

  • An init script should install GDAL: 

    pip install gdal -f https://girder.github.io/large_image_wheels
    
  • Rasterframe JAR should be build from source:

    git clone https://github.com/mjohns-databricks/rasterframes.git
    cd rasterframes
    sbt publishLocal
    
  • Rasterframe JAR should be uploaded to Databricks. After the build, the file is located at:

    /pyrasterframes/target/scala-2.12/pyrasterframes-assembly-0.10.1-SNAPSHOT.jar
    
Cloudkollektiv
  • 11,852
  • 3
  • 44
  • 71