SparkContext conflict with spark udf

Question

Good morning

When running:

from pyspark.sql.types import IntegerType
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()


class ETL:

    def addone(x):
        return x + 1

    def job_run():

        df = spark.sql('SELECT 1 one').withColumn('AddOne', udf_addone(F.col('one')))
        df.show()
        

if (__name__ == '__main__'):

    udf_addone = F.udf(lambda x: ETL.addone(x), returnType=IntegerType())
    ETL.job_run()

I get the following error message:

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

I have reviewed the answers given at ERROR:SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063 and at Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion with no success. I'd like to stick to using spark udf in my script.

Any help on this is appreciated.

Many thanks!

SparkContext conflict with spark udf

0 Answers0