2

I want to debug Spark code in PyCharm because it is easier to debug. But I need to add a spark-redis.jar otherwise Failed to find data source: redis

The code to connect to redis is

spark = SparkSession \
            .builder \
            .appName("Streaming Image Consumer") \
            .config("spark.redis.host", self.redis_host) \
            .config("spark.redis.port", self.redis_port) \
            .getOrCreate()

How to do fix it if using PyCharm?

I have tried adding spark.driver.extraClassPath in $SPARK_HOME/conf/spark-defaults.conf but it does not work.

I also tried adding environment variable PYSPARK_SUBMIT_ARGS --jars ... in run configuration but it raise other error

Litchy
  • 623
  • 7
  • 23

1 Answers1

1

Adding spark.driver.extraClassPath to spark-defaults.conf works for me with Spark 2.3.1

cat /Users/oleksiidiagiliev/Soft/spark-2.3.1-bin-hadoop2.7/conf/spark-defaults.conf

spark.driver.extraClassPath /Users/oleksiidiagiliev/.m2/repository/com/redislabs/spark-redis/2.3.1-SNAPSHOT/spark-redis-2.3.1-SNAPSHOT-jar-with-dependencies.jar

Please note, this is a jar with dependencies (you can build one from sources using mvn clean install -DskipTests).

Aslo I added pyspark libraries and SPARK_HOME environment variable to PyCharm project as described here https://medium.com/parrot-prediction/integrating-apache-spark-2-0-with-pycharm-ce-522a6784886f

fe2s
  • 425
  • 2
  • 9
  • my `spark-default.conf` content is `spark.driver.extraClassPath /home/litchy/Projects/pub-sub-serving/spark-redis-2.4.0-SNAPSHOT-jar-with-dependencies.jar` and I have `pyspark` package and `SPARK_HOME` set in my `run->environment variables` in PyCharm... Still not working.. But `spark-submit` is working, however isn't PyCharm starts the spark driver program by executing `spark-submit` command? – Litchy Jul 12 '19 at 01:59
  • PyCharm just runs the python script, there is no specific integration with Spark. What exception/error do you get? – fe2s Jul 12 '19 at 05:48
  • `Failed to find data source: redis`, this error would be fixed if we add the `spark-redis` jar in spark-submit. But how to do the same thing in Pycharm? (the same behavior of `--jars` in `spark-submit`) – Litchy Jul 24 '19 at 07:44