Good morning
When running:
from pyspark.sql.types import IntegerType
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
class ETL:
def addone(x):
return x + 1
def job_run():
df = spark.sql('SELECT 1 one').withColumn('AddOne', udf_addone(F.col('one')))
df.show()
if (__name__ == '__main__'):
udf_addone = F.udf(lambda x: ETL.addone(x), returnType=IntegerType())
ETL.job_run()
I get the following error message:
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
I have reviewed the answers given at ERROR:SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063 and at Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion with no success. I'd like to stick to using spark udf in my script.
Any help on this is appreciated.
Many thanks!