I'm a trying to create a table with spark.catalog.createTable
. It needs to have a partition column named "id".
Based on How can I use "spark.catalog.createTable" function to create a partitioned table? which is in Scala, I tried :
df = spark.range(10).withColumn("foo", F.lit("bar"))
spark.catalog.createTable("default.test_partition", schema=df.schema, **{"partitionColumnNames":"id"})
But it is not working. It creates a table in Hive with these properties :
CREATE TABLE default.test_partition ( id BIGINT, foo STRING )
WITH SERDEPROPERTIES ('partitionColumnNames'='id' ...
The DDL of the table should actually be:
CREATE TABLE default.test_partition ( foo STRING )
PARTITIONED BY ( id BIGINT )
WITH SERDEPROPERTIES (...
The signature of the method is :
Signature: spark.catalog.createTable(tableName, path=None, source=None, schema=None, **options)
So, I believe there is a special argument in **options
to create the partition, but I tried "partitionColumnNames", "partitionBy", "partition" ... none of them is working.
Do you know what is the proper keyword for that ?
EDIT: If you are wondering why I want to use this method, there are 2 reasons :
- For personnal curiosity, to know what is doable or not
- I want to use "dynamic partitions" with spark, which requires that I use
insertInto
method (see Overwrite specific partitions in spark dataframe write method). But this method needs the table to be created first, action that I want to perform with thespark.catalog.createTable
because it seems right.