3

After running pip install BigDL==0.8.0, running from bigdl.util.common import * from python completed without issue.

However, with either of the following SparkSessions:

spark = (SparkSession.builder.master('yarn')
    .appName('test')
    .config("spark.jars", "/BigDL/spark/dl/target/bigdl-0.8.0-jar-with-dependencies-and-spark.jar")
    .config('spark.submit.pyFiles', '/BigDL/pyspark/bigdl/util.zip')
    .getOrCreate()
)

or

spark = (SparkSession.builder.master('local')
    .appName('test')
    .config("spark.jars", "/BigDL/spark/dl/target/bigdl-0.8.0-jar-with-dependencies-and-spark.jar")
    .config('spark.submit.pyFiles', '/BigDL/pyspark/bigdl/util.zip')
    .getOrCreate()
)

I get the following error.

ImportError: ('No module named bigdl.util.common', <function subimport at 0x7fd442a36aa0>, ('bigdl.util.common',))

In addition of the 'spark.submit.pyFiles' config above, after the SparkSession successfully starts, I have tried spark.sparkContext.addPyFile("util.zip") where "util.zip" contains all of the python files in https://github.com/intel-analytics/BigDL/tree/master/pyspark/bigdl/util .

I have also zipped all of the contents in this folder https://github.com/intel-analytics/BigDL/tree/master/pyspark/bigdl (branch-0.8) and pointed to that file in the .config('spark.submit.pyFiles', '/path/to/bigdl.zip'), but this also does not work.

How do I get the SparkSession to see these files?

Clay
  • 2,584
  • 1
  • 28
  • 63

1 Answers1

3

Figured it out. The only thing that worked was spark.sparkContext.addPyFile("bigdl.zip") after the SparkSesssion has started. Where "bigdl.zip" contained all of the files in https://github.com/intel-analytics/BigDL/tree/master/pyspark/bigdl (branch-0.8).

Not sure why .config('spark.submit.pyFiles', 'bigdl.zip') would not work.

Clay
  • 2,584
  • 1
  • 28
  • 63