10

The below command was successfully converting external tables to managed tables in Spark 2.0.0:

ALTER TABLE {table_name} SET TBLPROPERTIES(EXTERNAL=FLASE);

However the above command is failing in Spark 2.2.0 with the below error:

Error in query: Cannot set or change the preserved property key: 'EXTERNAL';

Amit Kumar
  • 1,544
  • 11
  • 23
oushnik dey
  • 101
  • 1
  • 4
  • 2
    Hey @oushnik-dey were you able to find a solution for this? I am trying to accomplish the same. – hulin003 Mar 09 '18 at 21:54
  • I'm on Spark 2.1.0 btw. – hulin003 Mar 09 '18 at 22:02
  • I was able to do it using Pyspark.. please see below answer. https://stackoverflow.com/questions/57520227/how-to-delete-a-particular-month-from-a-parquet-file-partitioned-by-month/57615684#57615684 – vikrant rana Aug 22 '19 at 19:38

5 Answers5

5

In Spark 2.2.0 you can do the following:

import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.catalog.CatalogTable
import org.apache.spark.sql.catalyst.catalog.CatalogTableType

val identifier = TableIdentifier("table", Some("database"))
val oldTable = spark.sessionState.catalog.getTableMetadata(identifier)
val newTableType = CatalogTableType.MANAGED 

val alteredTable = oldTable.copy(tableType = newTableType)

spark.sessionState.catalog.alterTable(alteredTable)
Joha
  • 935
  • 12
  • 32
5

As @AndyBrown pointed our in a comment you have the option of dropping to the console and invoking the Hive statement there. In Scala this worked for me:

import sys.process._
val exitCode = Seq("hive", "-e", "ALTER TABLE {table_name} SET TBLPROPERTIES(\"EXTERNAL\"=\"FALSE\")").!

I faced this problem using Spark 2.1.1 where @Joha's answer does not work because spark.sessionState is not accessible due to being declared lazy.

1

The issue is case-sensitivity on spark-2.1 and above.

Please try setting TBLPROPERTIES in lower case -

    ALTER TABLE <TABLE NAME> SET TBLPROPERTIES('external'='false')
  • 8
    Although you have corrected the quoting and spelling of false, putting it in lower case just adds a user property "external" set to "false" and leaves the preserved property "EXTERNAL" as "TRUE" (and retains the table's external state). You can see this by running "DESCRIBE FORMATTED ". I have the same problem as the OP and cannot fix it (I will post and answer if I find one), but this is not the answer. – AndyBrown Aug 08 '18 at 14:14
  • 2
    @AndyBrown did you find how to resolve set external = false . I am getting error as you described org.apache.spark.sql.AnalysisException: Cannot set or change the preserved property key: 'EXTERNAL'; – donald Aug 16 '18 at 11:44
  • 2
    @donald sort of. I found a workaround that was so specific to my use-case I didn't consider it a suitable answer to this question: Because I'm operating in pyspark, I dropped to the shell and called the hive engine directly to execute the SET command, which worked because the error only occurred in spark context. Happy to expound in chat if you think this might be useful. – AndyBrown Aug 20 '18 at 11:36
  • Hey @AndyBrown could you please elaborate on how you've worked around this issue? Please post your answer/code as I'm encountering the same issue. Thanks! – GeoSal May 05 '20 at 13:13
0

I had the same issue while using a hive external table. I solved the problem by directly setting the propery external to false in hive metastore using a hive metastore client

 Table table = hiveMetaStoreClient.getTable("db", "table");
 table.putToParameters("EXTERNAL","FALSE");
 hiveMetaStoreClient.alter_table("db", "table", table,true);
Arun T
  • 9
  • 1
  • 2
0

I tried the above option from scala databricks notebook, and the external table was converted to MANAGED table and the good part is that the desc formatted option from spark on the new table is still showing the location to be on my ADLS. This was one limitation that spark was having, that we cannot specify the location for a managed table.

As of now i am able to do a truncate table for this. hopefully there was a more direct option for creating a managed table with location specified from spark sql.

Siong Thye Goh
  • 3,518
  • 10
  • 23
  • 31