0

I am trying to create a permanent function in spark using geomesa-spark-jts. Geomesa-spark-jts has huge potential in the larger LocationTech community. I started first by downloading geomesa-spark-jts which contain the following

enter image description here

The after that I have launched spark like this (I made sure that the jar is within the path)

enter image description here

Now whew I use ST_Translate which come with that package, it does give me a result

enter image description here

But the problem is when I try to define ST_Translate as a UDF , I get the following error

enter image description here

  • 1
    This is strange indeed. Did you try using sqlContext.udf.register instead? I think this could be because of where it is looking for this class. The jar is added to the driver process but not to the executors. I had some weird errors like this when I was using a specific version of spark-llap I think.. – Subramaniam Ramasubramanian May 08 '18 at 10:52
  • Which version of Spark are you using? GeoMesa 2.0.0 works with Spark 2.2.0. We haven't tested it with Spark 2.3.0. – GeoJim May 08 '18 at 13:04
  • Also, as a note, you mentioned using version 2.0.0-m1, and then your screenshot has version 2.0.0 and 1.3.0. I'd suggest cleaning up your classpath and just using version 2.0.0. – GeoJim May 08 '18 at 13:05
  • @SubramaniamRamasubramanian I didn't use sqlContext.udf.register bescause at some point I have to prise the type of the input or output but in both cases its a Geometry type which isn't known to Spark – Chems Bezzaz May 08 '18 at 13:06
  • Also, what's the function you are interested in adding? If it a common geospatial function, it might be easiest to contribute to GeoMesa;). – GeoJim May 08 '18 at 13:07
  • @GeoMesaJim the function that I interested in is ST_Translate and ST_MakeLine and ST_MakePolygon – Chems Bezzaz May 08 '18 at 13:08
  • ST_Translates isn't a class, it's a member variable: https://github.com/locationtech/geomesa/blob/master/geomesa-spark/geomesa-spark-jts/src/main/scala/org/locationtech/geomesa/spark/jts/udf/SpatialRelationFunctions.scala#L24 – Emilio Lahr-Vivaz May 08 '18 at 14:32
  • @EmilioLahr-Vivaz I you use esri geometry api and you do this sqlContext.sql("""create temporary function st_point as 'com.esri.hadoop.hive.ST_Point'"""); your are able to create a function, and that is the same thing that I want to do here – Chems Bezzaz May 08 '18 at 15:01
  • From what I can tell, `create temporary function` is tied to hive. ST_Point is an actual class that extends `org.apache.hadoop.hive.ql.exec.UDF`. It may not be possible to do what you want. At the least, it would seem to require some additional integration work in spark-jts – Emilio Lahr-Vivaz May 08 '18 at 15:13
  • @EmilioLahr-Vivaz Yes, you are Hive lack some functions such as ST_Translate. And I need that function – Chems Bezzaz May 08 '18 at 15:16
  • so you are trying to use st_translate with hive? – Emilio Lahr-Vivaz May 08 '18 at 16:34
  • @EmilioLahr-Vivaz Yes, exactly, if you have any idea how to do so please help me, I'm desperate. – Chems Bezzaz May 08 '18 at 19:51

3 Answers3

0

The functions you mentioned are already supported in GeoMesa 2.0.0 for Spark 2.2.0. http://www.geomesa.org/documentation/user/spark/sparksql_functions.html

GeoJim
  • 1,320
  • 7
  • 12
  • I know they are already in geomesa since I'm using geomesa-spark-jts jar – Chems Bezzaz May 08 '18 at 13:13
  • Mea culpea. It's early for me; I missed that you are trying to use the 'create temporary function' call. The FQCN for the function is likely something like org.locationtech.geomesa.spark.jts.udf.SpatialRelationFunctions$.ST_Translate. The $ (or something like that it) is needed to reference the Scala object SpatialRelationFunctions. That might do it... – GeoJim May 08 '18 at 13:20
  • Yes that's what I would like to do, can you elaborate your point more. – Chems Bezzaz May 08 '18 at 13:22
  • Of course! It looks like Spark needs the classname to try and register a new function name. The complete classname for the function is different from what you used (hence the JVM complaining that it could not find the class). If you try out the $ at the end of SpatialRelationFunctions in that call, it may work. Scala adds a dollar sign to the end of the Scala object. https://stackoverflow.com/questions/41570148/why-does-scala-place-a-dollar-sign-at-the-end-of-class-names – GeoJim May 08 '18 at 13:27
  • Even with the dollar added I still get the same error – Chems Bezzaz May 08 '18 at 13:55
0

The geomesa-accumulo-spark-runtime jar is a shaded jar that includes the code from geomesa-spark-jts. You might be hitting issues with having the classes defined in two different jars.

Emilio Lahr-Vivaz
  • 1,439
  • 6
  • 5
  • So you suggest removing geomesa-accumulo-spark-runtime from the path ? – Chems Bezzaz May 08 '18 at 13:23
  • yes, remove one or the other. if you're just using the jts bindings, then remove the accumulo jar. if you want the additional geotools functionality, then remove the jts jar (it will still be present in the accumulo jar) – Emilio Lahr-Vivaz May 08 '18 at 13:34
  • when I removed geomesa-accumulo-spark-runtime from the path and I did " import org.locationtech.geomesa.spark.jts._ " I got this erro error: missing or invalid dependency detected while loading class file 'SpatialEncoders.class'. – Chems Bezzaz May 08 '18 at 13:43
  • you will also need jts on the classpath - com.vividsolutions:jts-io,com.vividsolutions:jts-core,org.locationtech.spatial4j:spatial4j and possibly transitive dependencies of theirs. it may be simpler to use the accumulo-spark-runtime jar instead – Emilio Lahr-Vivaz May 08 '18 at 13:57
0

In order to use st_translate with hive, I believe that you would have to implement a new class that extends org.apache.hadoop.hive.ql.exec.UDF and invokes the GeoMesa function.

Emilio Lahr-Vivaz
  • 1,439
  • 6
  • 5