0

I am trying to run an example code of TensorFlow extended at https://www.tensorflow.org/tfx/tutorials/transform/census on databricks GPU cluster.

My env:

7.1 ML Spark 3.0.0 Scala 2.12 GPU
python 3.7
tensorflow: Version: 2.1.1
tensorflow-transform==0.22.0
apache_beam==2.21.0

When I run

 transform_data(train, test, temp)

I got error:

 Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063

It seems that this is a known issue of RDD on Spark. https://issues.apache.org/jira/browse/SPARK-5063

I tried to search some solutions here, but none of them work for me. how to deal with error SPARK-5063 in spark

At the example code, I do not see where SparkContext is accessed from worker explicitly. It is called from Apache Beam ?

Thanks

user3448011
  • 1,469
  • 1
  • 17
  • 39

0 Answers0