I'm trying to assemble a big data infrastructure on my local machine using MongoDB > Apache Spark > RStudio sparklyr. I can't find a solution to connect sparklyr to MongoDB. There are a small number of old posts on the Internet, but no solution yet. MongoDB connectors show support for SparkR, but this package is not on CRAN anymore.
With Pyspark I could connect and is working with the following configs:
# import SparkSession from the pyspark package
from pyspark.sql import SparkSession
# initiate the connection
my_spark = SparkSession \
.builder \
.appName("Analysis") \
.config("http://spark.mongodb.read.connection.uri", "mongodb://127.0.0.1/safricadb.vacancy") \
.config("spark.mongodb.write.connection.uri", "mongodb://127.0.0.1/safricadb.vacancy") \
.config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector:10.0.2')\
.getOrCreate()
# load
df = my_spark.read.format('com.mongodb.spark.sql.connector.MongoTableProvider').load()
Could someone offer me guidance on how I should approach this connection with sparklyr?
(Versions: MongoDB Community Server version 5.0.9; Apache Spark 3.3.0 with Hadoop 2.7; Mongo Spark Connector 10.0.2).