-1

When using Spark MongoDB connector in Scala Application you can import the MongoSpark companion object via import com.mongodb.spark.config._ , then run val rdd = MongoSpark.load(spark) to load your collection. I want to do the same in a python application, but how should I make MongoSpark object available in my python application. There is no python package that to install and import. what is workaround

yashar
  • 71
  • 6

1 Answers1

1

Please see the Spark Connector Python Guide for more information.

Below is a short example connecting to MongoDB from pySpark:

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
df.printSchema()
Ross
  • 17,861
  • 2
  • 55
  • 73
  • it gives the exception:Py4JJavaError: An error occurred while calling o71.load. : java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource. Please find packages at http://spark.apache.org/third-party-projects.html. – yashar Apr 26 '17 at 15:26
  • How should I make the com.mongodb.spark.sql.DefaultSource available in python application, let say in the spyder IDE. – yashar Apr 26 '17 at 15:28
  • 1
    You need to include the jar / package. When running pyspark you can add: `--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0` – Ross Apr 26 '17 at 16:51
  • 1
    I am using Spyder as the development IDE, is there any way to start Spyder with these .jar packages already available? – yashar Apr 26 '17 at 18:27
  • I found this link which is very relevant http://dataxwying.blogspot.nl/2016/02/setup-spyder-for-spark-step-by-step.html . Though I am still missing the MongoSpark object in python. – yashar Apr 28 '17 at 12:52
  • For spark-submit: `./bin/spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0 \--master spark://ip-or-domain-here:7077 \sparkapps/test.py` – Phyticist Jun 17 '17 at 11:09
  • worked for me, thanks! – Artem Aug 30 '18 at 15:26