0
  1. Created a new table in hive in partitioned and ORC format.
  2. Writing into this table using spark by using append ,orc and partitioned mode.

It fails with the exception:

org.apache.spark.sql.AnalysisException: The format of the existing table test.table1 is `HiveFileFormat`. It doesn't match the specified format `OrcFileFormat`.;
  1. I change the format to "hive" from "orc" while writing . It still fails with the exception : Spark not able to understand the underlying structure of table .

So this issue is happening because spark is not able to write into hive table in append mode , because it cant create a new table . I am able to do overwrite successfully because spark creates a table again.

But my use case is to write into append mode from starting. InsertInto also does not work specifically for partitioned tables. I am pretty much blocked with my use case. Any help would be great.

Edit1: Working on HDP 3.1.0 environment.

Spark Version is 2.3.2

Hive Version is 3.1.0

Edit 2:

// Reading the table 
val inputdf=spark.sql("select id,code,amount from t1")

//writing into table
inputdf.write.mode(SaveMode.Append).partitionBy("code").format("orc").saveAsTable("test.t2")

Edit 3: Using insertInto()

val df2 =spark.sql("select id,code,amount from t1")
df2.write.format("orc").mode("append").insertInto("test.t2");

I get the error as:

20/05/17 19:15:12 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
20/05/17 19:15:12 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
20/05/17 19:15:13 WARN AcidUtils: Cannot get ACID state for test.t1 from null
20/05/17 19:15:13 WARN AcidUtils: Cannot get ACID state for test.t1 from null
20/05/17 19:15:13 WARN HiveMetastoreCatalog: Unable to infer schema for table test.t1 from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.

If I rerun the insertInto command I get the following exception :

20/05/17 19:16:37 ERROR Hive: MetaException(message:The transaction for alter partition did not commit successfully.)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_partitions_req_result$alter_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)

Error in hive metastore logs :

2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(907)) - 163: alter_partitions : tbl=hive.test.t1
2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(349)) - ugi=X@A.ORG  ip=10.10.1.36   cmd=alter_partitions : tbl=hive.test.t1
2020-05-17T21:17:43,891 INFO  [pool-8-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:alter_partitions_with_environment_context(5119)) - New partition values:[BR]
2020-05-17T21:17:43,913 ERROR [pool-8-thread-198]: metastore.ObjectStore (ObjectStore.java:alterPartitions(4397)) - Alter failed
org.apache.hadoop.hive.metastore.api.MetaException: Cannot change stats state for a transactional table without providing the transactional write state for verification (new write ID -1, valid write IDs null; current state null; new state {}
Aviral Kumar
  • 814
  • 1
  • 15
  • 40

1 Answers1

0

I was able to resolve the issue by using external tables in my use case. We currently have an open issue in spark , which is related to acid properties of hive . Once I create hive table in external mode , I am able to do append operations in partitioned/non partitioned table. https://issues.apache.org/jira/browse/SPARK-15348

Aviral Kumar
  • 814
  • 1
  • 15
  • 40