loading a csv file to existing HIVE tale through Spark

Question

Below is the code that I have written to connect to a RDBMS, then create temp table , execute SQL query on that temp table, saving the SQL query output to a .csv format through databricks module.

from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://<server>:<port>").option("databaseName","xxx").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","xxxx").option("user","xxxxx").option("password","xxxxx").load()

df.registerTempTable("test")

df1= sqlContext.sql("select * from test where xxx= 6")
df1.write.format("com.databricks.spark.csv").save("/xxxx/xxx/ami_saidulu")

df1.write.option("path", "/xxxx/xxx/ami_saidulu").saveAsTable("HIVE_DB.HIVE_TBL",format= 'csv',mode= 'Append')

Where HIVE.DB is an existing HIVE DATABASE HIVE.TBL is an existing HIVE TABLE

after I execute the code, I am getting below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o68.saveAsTable.
: java.lang.RuntimeException: Append mode is not supported by com.databricks.spark.csv.DefaultSource15

Does that mean, the databricks module doesn't support "saveAsTable" function?

If yes, then please point out the mistakes in my code. If no, then what is the solution/work around/industry standards ?

Spark 1.6.1

Is the Hive table external or managed? Is there a reason you write a csv in between, can't you just do df.write.saveAsTable('HIVE_DB.HIVE_TBL',write_mode='Append')? — kutschkem, Dec 22 '17 at 09:21
Its a managed table, and yes, I got your point. Now after making changes as per your suggestion, when I execute the code, I get below error "SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation" — user3521180, Dec 22 '17 at 09:42
Is that an error or a warning? I would assume the code should still work, just not log anything. — kutschkem, Dec 22 '17 at 09:46
That is a failed error, but I believe that the code is right. I did come across a stackoverflow solution "https://stackoverflow.com/questions/7421612/slf4j-failed-to-load-class-org-slf4j-impl-staticloggerbinder". It says that we need to install jars. I checked, and those jars are not present there. — user3521180, Dec 22 '17 at 09:56
Missing SLF4J bindings is just a warning... Are you trying to do incremental imports to Hive? — OneCricketeer, Dec 22 '17 at 13:35

score 1 · Answer 1 · answered Dec 22 '17 at 10:07

Assuming your table is managed: Just do df.write.saveAsTable('HIVE_DB.HIVE_TBL',write_mode='Append')‌, no need to go through an intermediate csv-File.

What this error means is that the databricks module for csv does not support Append mode. There is an issue on github here. So the solution is not to use csv with append mode.

score 1 · Accepted Answer · answered Dec 22 '17 at 14:02

I can suggest you one another solution.

You can use Insert functionality to insert in the table.

sqlContext.sql("INSERT INTO/OVERWRITE TABLE HIVE_DB.HIVE_TBL select * from test where xxx= 6")

I hope this solution will help you and you can directly write into table, why do you want to write in csv and then writing into the table?

Even if you want text delimited file @table path. Just define table as TextFile table with the required delimiter. Your files @table path would be the delimited one after insert.

Thanks

loading a csv file to existing HIVE tale through Spark

2 Answers2