2

even after reinstalling pyspark and snappydata whenever I try to import pyspark.sql.snappy import SnappyContext from the code below:

from pyspark.sql.snappy import SnappyContext
from pyspark.storagelevel import StorageLevel
SparkContext._ensure_initialized()

spark = SparkSession.builder.appName("test")  \
                        .master("local[*]") \
                        .config("spark.snappydata.connection", "localhost:1527") \
                        .getOrCreate() 

snappy = SnappySession(spark)
snappy.sql("SELECT col1, min(col2) from TABLE1")

I get error:

Traceback (most recent call last):
  File "testpy.py", line 4, in <module>
    from pyspark.sql.snappy import SnappyContext
ImportError: No module named snappy

Please help!

techie95
  • 515
  • 3
  • 16
  • Have you added snappydata as a dependency as described [here](https://snappydatainc.github.io/snappydata/quickstart/getting_started_with_your_spark_distribution/) ? (assuming spark version >= 2.1.1). Also, how are you running the pyspark script? Locally using `spark-submit`? – mkaran Oct 31 '17 at 10:17
  • I'm new to this. I'm just running the python script in shell. – techie95 Oct 31 '17 at 10:42

1 Answers1

1

This was a known problem in last released version. This has been fixed in latest master. $SNAPPY_HOME/bin/pyspark refers to the python scripts inside $SNAPPY_HOME/pyspark folder. Unfortunately, some build changes stopped copying SnappyData python scripts in the folder. You can build the current master to work with pyspark.

  • Check out these docs to build from source http://snappydatainc.github.io/snappydata/install/building_from_source/ – plamb Oct 31 '17 at 15:42