12

I'm trying to save dataframe in table hive.

In spark 1.6 it's work but after migration to 2.2.0 it doesn't work anymore.

Here's the code:

blocs
      .toDF()
      .repartition($"col1", $"col2", $"col3", $"col4")
      .write
      .format("parquet")
      .mode(saveMode)
      .partitionBy("col1", "col2", "col3", "col4")
      .saveAsTable("db".tbl)

The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat. It doesn't match the specified format ParquetFileFormat.; org.apache.spark.sql.AnalysisException: The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat. It doesn't match the specified format ParquetFileFormat.;

Valeriy
  • 1,365
  • 3
  • 18
  • 45
youssef grati
  • 121
  • 1
  • 1
  • 5
  • have you got any solution ? i am facing same issue..can you please let me know what is the work around – BigD Feb 08 '19 at 11:42
  • Yes, i used insertInto instead of saveAsTable and i deleted partitionby. The code: blocs .toDF() .repartition($"col1", $"col2", $"col3", $"col4") .write .format("parquet") .insertInto("db".tbl) – youssef grati Feb 09 '19 at 12:07
  • am using spark 2.3.0 .. is repartitions works on latest spark ? – BigD Feb 09 '19 at 15:34

1 Answers1

13

I have just tried to use .format("hive") to saveAsTable after getting the error and it worked.

I also would not recommend to use insertInto suggested by the author, because it looks not type-safe (as much as this term can be applied to SQL API) and is error-prone in the way it ignores column names and uses position-base resolution.

  • how do i insert only specific columns from the dataFrame into the hive table? say, i have 50 columns in my table, but i have 20 columns only in my DF that i want to update/insert to the table. consider those 20 as required while the others are not mandatory. With above, it gives the position/column mismatch kind of error. – Ak777 Sep 17 '20 at 14:41
  • Your solution `.format('hive')` works when the table is not partitioned. If it's partitioned, I am getting a different error `org.apache.spark.SparkException: Requested partitioning does not match the` after switching from `.format('parquet')`. – John Jiang Feb 13 '23 at 01:41