Hope you are all doing well.
We have been facing a weird issue with our notebooks. We are using a couple of scala packages. When we import the scala package in scala cell, the imports are failing with the below mentioned error. Here I am considering the example of spark-excel, but they are not limited to it.
error: object crealytics is not a member of package com
import com.crealytics.spark.excel._ ^
Interestingly the issue occurs only when the cluster starts from cold state. If the cluster is already running or if the job is re-run immediately after the failure, then we do not have this issue.
Options Tried:
Adding the library to the cluster library.
Adding the library as dependent library during workflow creation.
Checking a class inside the library for presence inside a try/catch block and then sleeping if it does not work. Class.forName("shadeio.poi.util.IOUtils")
Interestingly this option work for a similar issue we had with a python library.
try: __import__(library_name) except: <sleep>
Additionally have also tried sleeping...Upto 5 minutes with no effect.
Thread.sleep(sleep_timer*1000)
Is there a way to prevent this issue ? Is there a way to check the package and then add it manually if it is not there or atleast sleep and give it time to get registered Will just setting sleep for say 5 minutes without checking is the only option ?
Environment:
- Azure Databricks - 10.4 runtime - Interactive Cluster
- Spark 3.2.1 (We had same issue with Spark 3.0.1 as well)
- Scala 2.12
- spark-excel 0.17
Thank you for all the help in advance. Have a great day all...