3

I want to register custom SparkListener with Databricks' spark context.

With basic spark i can just use "spark.jars" and "spark.extraListeners" configs during spark-submit. OR use sparkContext.addSparkListener api.

For databricks setup,I have installed the jar containing listener on my cluster. When I put the config "spark.extraListeners" in "advanced" config tab of the cluster, cluster fails to initialize throwing error Listener not found.

I tried setting it during sparksession builder like

    .builder \
    .appName("abc") \
    .config("spark.extraListeners","mySparkListener") \
    .enableHiveSupport() \
    .getOrCreate()

databricks wont add it. No errors thrown but listener is not added.

Is there any way to do this? Note: I am using python notebooks on databricks

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Aparna
  • 31
  • 3

1 Answers1

0

The problem is that when you get into the notebook, SparkSession is already initialized, so your configuration doesn't have an effect.

You need to have this setting specified when cluster is starting - you did it correctly by specifying in the cluster Spark conf settings, but the problem is that libraries are installed after Spark is started, and necessary classes aren't found. You can fix this by adding a cluster init script, something like this - you need to have you library installed somewhere on DBFS (I use /FileStore/jars/my_jar.jar as an example):

#!/bin/bash

cp /dbfs/FileStore/jars/my_jar.jar /databricks/jars

this script will copy your jar file into the directory with jars on the local disk, and this will happen before Spark starts.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132