I want to update some lines in my hive table. As pyspark doesn't recognize UPDATE at all I went for DELETE and INSERT, but am getting "Operation not allowed" on the DELETE operation.
To solve this I specified the table as orc and tried the rest of the requirements mentioned on this site: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Limitations
I did also set the tableproperty "transactional"="true". Below you will see some code how I tried setting the properties
sqlCtx.sql("""SET spark.hadoop.hive.support.concurrency=true""")
sqlCtx.sql("""SET spark.hadoop.hive.enforce.bucketing=true""")
sqlCtx.sql("""SET spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict""")
sqlCtx.sql("""SET spark.hadoop.hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager""")
sqlCtx.sql("""SET spark.hadoop.hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager""")
sqlCtx.sql("""SET spark.hadoop.hive.compactor.initiator.on=true""")
sqlCtx.sql("""SET spark.hadoop.hive.compactor.worker.threads=1""")
# Some other stuff happens creating the values etc
# Then I create the table from another table as orc
sqlCtx.sql("CREATE TABLE " + name + " AS SELECT * FROM new_base AS orc")
sqlCtx.sql("ALTER TABLE " + name + """ SET TBLPROPERTIES("transactional"="true")""")
# This will now result in Operation not allowed
sqlCtx.sql("DELETE FROM " + name) # I didn't keep the Where clause as it makes no difference so the error is not in the missing Where clause
I expected the DELETE clause to do something, at least Throwing an error due to the missing Where clause, but I only get pyspark.sql.utils.ParseException: '\nOperation not allowed: DELETE FROM ...
If a complete code example with the creation of a table is more helpful write so in the comments and I will add it, I kept it away for better readibility. I should also note, that this is run completely locally.