2

Hope you are all doing well.

We have been facing a weird issue with our notebooks. We are using a couple of scala packages. When we import the scala package in scala cell, the imports are failing with the below mentioned error. Here I am considering the example of spark-excel, but they are not limited to it.

error: object crealytics is not a member of package com
import com.crealytics.spark.excel._ ^

Interestingly the issue occurs only when the cluster starts from cold state. If the cluster is already running or if the job is re-run immediately after the failure, then we do not have this issue.

Options Tried:

  1. Adding the library to the cluster library.

  2. Adding the library as dependent library during workflow creation.

  3. Checking a class inside the library for presence inside a try/catch block and then sleeping if it does not work. Class.forName("shadeio.poi.util.IOUtils")

    Interestingly this option work for a similar issue we had with a python library.

       try:
           __import__(library_name)
       except:
           <sleep>
    
  4. Additionally have also tried sleeping...Upto 5 minutes with no effect.

   Thread.sleep(sleep_timer*1000)

Is there a way to prevent this issue ? Is there a way to check the package and then add it manually if it is not there or atleast sleep and give it time to get registered Will just setting sleep for say 5 minutes without checking is the only option ?

Environment:

  • Azure Databricks - 10.4 runtime - Interactive Cluster
  • Spark 3.2.1 (We had same issue with Spark 3.0.1 as well)
  • Scala 2.12
  • spark-excel 0.17

Thank you for all the help in advance. Have a great day all...

rainingdistros
  • 450
  • 3
  • 11
  • is it a job or interactive cluster? – Alex Ott Jun 14 '22 at 14:40
  • It is an interactive cluster... – rainingdistros Jun 14 '22 at 15:04
  • you can put your libraries in dbfs and then try? – teedak8s Jun 15 '22 at 00:34
  • I had thought of installing lib from dbfs, but had a doubt regarding dependencies. Say I have a maven package (e.g. spark-excel) and a PyPi (e.g. paramiko) module, I am under the impression that installing them via UI will also take care of its dependencies if available. Is this understanding correct ? If I am going to via DBFS, I will also have to consider the dependencies of those packages ? – rainingdistros Jun 15 '22 at 06:13

0 Answers0