Below is the code that I have written to connect to a RDBMS, then create temp table , execute SQL query on that temp table, saving the SQL query output to a .csv format through databricks module.
from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://<server>:<port>").option("databaseName","xxx").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","xxxx").option("user","xxxxx").option("password","xxxxx").load()
df.registerTempTable("test")
df1= sqlContext.sql("select * from test where xxx= 6")
df1.write.format("com.databricks.spark.csv").save("/xxxx/xxx/ami_saidulu")
df1.write.option("path", "/xxxx/xxx/ami_saidulu").saveAsTable("HIVE_DB.HIVE_TBL",format= 'csv',mode= 'Append')
Where HIVE.DB is an existing HIVE DATABASE HIVE.TBL is an existing HIVE TABLE
after I execute the code, I am getting below error:
py4j.protocol.Py4JJavaError: An error occurred while calling o68.saveAsTable.
: java.lang.RuntimeException: Append mode is not supported by com.databricks.spark.csv.DefaultSource15
Does that mean, the databricks module doesn't support "saveAsTable" function?
If yes, then please point out the mistakes in my code. If no, then what is the solution/work around/industry standards ?
Spark 1.6.1