1

Good morning

When running:

from pyspark.sql.types import IntegerType
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()


class ETL:

    def addone(x):
        return x + 1

    def job_run():

        df = spark.sql('SELECT 1 one').withColumn('AddOne', udf_addone(F.col('one')))
        df.show()
        

if (__name__ == '__main__'):

    udf_addone = F.udf(lambda x: ETL.addone(x), returnType=IntegerType())
    ETL.job_run()

I get the following error message:

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

I have reviewed the answers given at ERROR:SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063 and at Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion with no success. I'd like to stick to using spark udf in my script.

Any help on this is appreciated.

Many thanks!

Christian Ivaha
  • 199
  • 1
  • 3
  • 10

0 Answers0