I have a spark Job that read data from an External Hive Table and do some transformation and re-save data in another internal Hive Table
val sparkConf = new SparkConf().setAppName("Bulk Merge Daily Load Job")
val sparkContext = new SparkContext(sparkConf)
val sqlContext = new HiveContext(sparkContext)
// Data Ingestion
val my_df = sqlContext.sql("select * from test")
// Transformation
...
...
// Save Data into Hive
my_df.write.format("orc")
.option("orc.compress","SNAPPY")
.mode(SaveMode.Overwrite)
.saveAsTable("my_internal_table")
The external Table is created with the this tblproperties
line :
tblproperties ("skip.header.line.count"="1");
My problem is that i found in my rows in the my_internal_table
Table an additional line representing the columns name .
I guess this is related to this issue :
I am using spark 1.6.0
Can you help me on this :
- Is this bug still occuring in
1.6.0
? - Is there any simple way to avoid this ?
PS : I am processing large file > 10Go .
Thanks in advance for your response.