2

I am interested in performing Big Data Geospatial analysis on Apache Spark. My data is stored in Azure data lake, and I am restricted to use Azure Databricks. Is there anyway to download Geomesa on Databrick? Moreover, I would like to use the python api; what should I do?

Any help is much appreciated!!

I. A
  • 2,252
  • 26
  • 65
  • Discussions like this might be easier to have on Gitter or one of the GeoMesa email lists. See https://github.com/locationtech/geomesa#join-the-community for more info! – GeoJim Oct 29 '19 at 14:02

4 Answers4

4

You can install GeoMesa Library directly into your Databricks cluster.

1) Select the Libraries option then a new window will open.

enter image description here

2) Select the maven option and click on 'search packages' option next images

3) Search the required library and select the library/jar version and choose the 'select' option.
Thats it. search the jar/library in maven repository

After the installation of the library/jar, restart your cluster. Now import the required classes in your Databricks notebook.
I hope it helps. Happy Coding..

venus
  • 1,188
  • 9
  • 18
1

As a starting point, without knowing any more details, you should be able to use the GeoMesa filesystem data store against files stored in WASB.

Emilio Lahr-Vivaz
  • 1,439
  • 6
  • 5
1

CCRi (backers of geomesa) has generated spark runtime friendly build. A shaded fat jar for GeoMesa (current version is 3.3.0) is available at the maven coordinates org.locationtech.geomesa:geomesa-gt-spark-runtime_2.12:3.3.0 which for Databricks.​ ​S​ince it is shaded, users can add maven exclusions to get it to cleanly install which would be "jline:*,org.geotools:*" added in Databricks library UI without quotes​.

0

Running GeoMesa within Databricks is not straightforward:

  • GeoMesa’s artifacts are published on Maven Central, but require dependencies that are only available on third-party repositories, which is cumbersome given Databricks’ library import mechanism.
  • GeoMesa conflicts with an older version of the scalalogging library present in the Databricks runtime (the infamous JAR Hell problem).

Reference: Use GeoMesa in Databricks

Hope this helps.

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42