1

I am trying to install the package GDAL on an Azure Databricks cluster. In no way I can get it to work.

Approaches that I've tried but didn't work:

  1. Via the library tab of the corresponding cluster --> Install New --> PyPi (under Library Source) --> Entered gdal under Package

  2. Tried all approaches mentioned on https://forums.databricks.com/questions/13738/gdal-installation.html. None of them worked.

Details:

  1. Runtime: 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (When using runtime 3.5 I got GDAL to work, however an update to a higher runtime was necessary for other reasons.)

  2. We're using python 3.7.

Senna
  • 84
  • 1
  • 6

3 Answers3

1

Finally we got it working by using an ML runtime in combination with the answer given in forums.databricks.com/answers/21118/view.html. Apparently the ML-runtimes contain conda, which is needed for the answer given in the previous link.

Senna
  • 84
  • 1
  • 6
  • Confirmed. By using the Databricks Runtime Version "7.4 ML (includes Apache Spark 3.0.1, Scala 2.12)" within the setup of the cluster means that the python code within Notebook can successfully run: conda install gdal=2.3.3 – Marc Stevenson Nov 06 '20 at 03:54
0

I have already replied similar type of question. Please check the below link would help you to install the required library:

How can I download GeoMesa on Azure Databricks?

For your convenience I am pasting the Answer again... just you need to choose your required library from the search area.

You can install GDAL Library directly into your Databricks cluster.

1) Select the Libraries option then a new window will open. enter image description here

2) Select the maven option and click on 'search packages' option enter image description here

3) Search the required library and select the library/jar version and choose the 'select' option. Thats it. enter image description here

After the installation of the library/jar, restart your cluster. Now import the required classes in your Databricks notebook. I hope it helps. Happy Coding..

venus
  • 1,188
  • 9
  • 18
  • Unfortunately, it still does not work. GDAL is installed, however when trying to import GDAL in the Databricks notebook, I get a ModuleNotFoundError: No module named 'gdal'. Any ideas why this is happening? – Senna Nov 07 '19 at 10:35
  • have you restarted the cluster ?? – venus Nov 07 '19 at 14:40
  • Yes I restarted the cluster – Senna Nov 09 '19 at 09:40
0

pip install https://manthey.github.io/large_image_wheels/GDAL-3.1.0-cp38-cp38-manylinux2010_x86_64.whl

Looks like you are able to use this whl file and install the package but when running tasks like GDAL.Translate it will not actually run. This is the farthest I've gotten.

The above URL was found when I was searching for the binaries that GDAL needs. As a note you will have to run this every time you start your cluster.

  • 'Somehow this works' might not be the most useful way of answering a question. Why does it work? Where did you get the url from? – gofvonx Apr 16 '21 at 16:04