My code is as follows :
def processFiles(prcFile , spark:SparkSession):
print(prcFile)
app_id = spark.sparkContext.getConf().get('spark.app.id')
app_name = spark.sparkContext.getConf().get('spark.app.name')
print(app_id)
print(app_name)
def main(configPath,args):
config.read(configPath)
spark: SparkSession = pyspark.sql.SparkSession.builder.appName("multiprocessing").enableHiveSupport().getOrCreate()
mprc = multiprocessing.Pool(3)
lst=glob.glob(config.get('DIT_setup_config', 'prcDetails')+'prc_PrcId_[0-9].json')
mprc.map(processFiles,zip(lst, repeat(spark.newSession())))
Now I want to pass a new session of Spark (spark.newSession()) and process data accordingly, but I am getting an error that says :
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Any help will be highly appreciable