3

Attempting to load data into Databricks using COPY INTO, I have data in storage (as CSV files) that has the following schema:

event_time TIMESTAMP,
aws_region STRING,
event_id STRING,
event_name STRING

I wish for the target table to be partitioned by DAY, which should be extracted from the event_time column.

However, attempting to use an expression in the PARTITIONED BY column yields the following error:

CREATE TABLE IF NOT EXISTS MY_TABLE (
  event_time TIMESTAMP,
  aws_region STRING,
  event_id STRING,
  event_name STRING
)
PARTITIONED BY(TO_DATE(event_time))

java.sql.SQLException: [Databricks]DatabricksJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: [DELTA_OPERATION_NOT_ALLOWED] com.databricks.sql.transaction.tahoe.DeltaAnalysisException: Operation not allowed: Partitioning by expressions is not supported for Delta tables

My limitation is that I cannot alter the existing schema, for example by creating a new field:

event_date DATE

And use that as the partition by column.

Is there any other way to overcome this?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321

1 Answers1

0

You need to use GENERATED column like this (doc):

CREATE TABLE IF NOT EXISTS MY_TABLE (
  event_time TIMESTAMP,
  event_date date GENERATED ALWAYS AS (TO_DATE(event_time)),
  aws_region STRING,
  event_id STRING,
  event_name STRING
)
PARTITIONED BY(event_date)

You can read more about generated columns in the linked documentation or here.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132