5

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.catalog.Catalog

There is an option parameter but I didn't find any sample that use it to pass the partitioned columns

Guy Cohen
  • 266
  • 4
  • 7
  • Checked Spark sources. It looks like in Spark 2.4 and earlier it is still impossible to create partitioned tables using `org.apache.spark.sql.catalog.Catalog`. – Dmitry Y. Jan 22 '19 at 15:45
  • Thanks @DmitryY. I also checked and found only the option parameter ... Meanwhile I switched to raw SQL with spark.sql – Guy Cohen Jan 23 '19 at 17:02
  • I created [SPARK-31001](https://issues.apache.org/jira/browse/SPARK-31001) to request that this ability be added. – Nick Chammas Mar 01 '20 at 19:04

1 Answers1

4

I believe it's not needed to specify partition columns if you don't provide a schema. In that case spark infers schema and partitioning from the location automatically. However it's not possible to provide both schema and partitioning with the current implementation, but fortunately all the code from underlying implementation is open thus i finished with the next method for creating external Hive tables.

  private def createExternalTable(tableName: String, location: String, 
      schema: StructType, partitionCols: Seq[String], source: String): Unit = {
    val tableIdent = TableIdentifier(tableName)
    val storage = DataSource.buildStorageFormatFromOptions(Map("path" -> location))
    val tableDesc = CatalogTable(
      identifier = tableIdent,
      tableType = CatalogTableType.EXTERNAL,
      storage = storage,
      schema = schema,
      partitionColumnNames = partitionCols,
      provider = Some(source)
    )
    val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
    spark.sessionState.executePlan(plan).toRdd  
  }
Mikita Harbacheuski
  • 2,193
  • 8
  • 16