I'm new to Spark and I'm trying to use PySpark to connect to Hive to perform a query and load data to a dataframe and then write that data to couchbase. Based on examples I have to create a spark context for both to be able to connect to the datasources. However, I am only able to create one context in a script/session. What is the best practice to move a set of data from one data source to another using Spark?
Asked
Active
Viewed 599 times
0
-
3you can essentially create only one SparkContext, and same has to be used throughout your code no matter how many databases you are connecting to. – toofrellik Aug 28 '19 at 04:08
-
1A bit old so maybe there are better solutions but maybe it could help you: https://stackoverflow.com/questions/32714396/querying-on-multiple-hive-stores-using-apache-spark – Shaido Aug 28 '19 at 05:33