I would like to merge the output to 128mb per file in Hive. In the Spark, I have set up the following attributes, but it still can't work. Can someone give me a suggestion?
val spark = SparkSession.builder
.appName("MyExample")
.master("local[*]")
.enableHiveSupport()
.getOrCreate()
spark.sqlContext.setConf("hive.mapred.supports.subdirectories", "true")
spark.sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive", "true")
spark.sqlContext.setConf("hive.hadoop.supports.splittable.combineinputformat", "true")
spark.sqlContext.setConf("hive.exec.compress.output", "false")
spark.sqlContext.setConf("hive.input.format", "org.apache.hadoop.hive.ql.io.CombineHiveInputFormat")
spark.sqlContext.setConf("hive.merge.mapfiles", "true")
spark.sqlContext.setConf("hive.merge.mapredfiles", "true")
spark.sqlContext.setConf("hive.merge.size.per.task", "128000000")
spark.sqlContext.setConf("hive.merge.smallfiles.avgsize", "128000000")
spark.sqlContext.setConf("hive.groupby.skewindata", "true")
spark.sqlContext.setConf("hive.merge.sparkfiles", "true")
spark.sqlContext.setConf("hive.merge.mapfiles", "true")
val df = spark.read.format("csv")
.option("header", "false").load(path)
df.write.format("csv").saveAsTable("test_table")