I am trying to run an example code of TensorFlow extended at https://www.tensorflow.org/tfx/tutorials/transform/census on databricks GPU cluster.
My env:
7.1 ML Spark 3.0.0 Scala 2.12 GPU
python 3.7
tensorflow: Version: 2.1.1
tensorflow-transform==0.22.0
apache_beam==2.21.0
When I run
transform_data(train, test, temp)
I got error:
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063
It seems that this is a known issue of RDD on Spark. https://issues.apache.org/jira/browse/SPARK-5063
I tried to search some solutions here, but none of them work for me. how to deal with error SPARK-5063 in spark
At the example code, I do not see where SparkContext is accessed from worker explicitly. It is called from Apache Beam ?
Thanks