13

How do you rename a column in Databricks?

The following does not work:

ALTER TABLE mySchema.myTable change COLUMN old_name new_name int

It returns the error:

ALTER TABLE CHANGE COLUMN is not supported for changing column 'old_name' with type 'IntegerType >(nullable = true)' to 'new_name' with type 'IntegerType (nullable = true)';

If it makes a difference, this table is using Delta Lake, and it is NOT partitioned or z-ordered by this "old_name" column.

David Maddox
  • 1,884
  • 3
  • 21
  • 32

3 Answers3

17

You can't rename or change a column datatype in Databricks, only add new columns, reorder them or add column comments. To do this you must rewrite the table using the overwriteSchema option.

Take this example below from this documentation:

spark.read.table(...)
  .withColumnRenamed("date", "date_created")
  .write
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .table(...)
LeandroHumb
  • 843
  • 8
  • 23
  • Does using "overwriteSchema" perform faster than dropping the table and re-creating it? – David Maddox Dec 27 '19 at 15:44
  • I think the performance will be the same, but at least you can just execute all in one action, and if you are using Delta Lake, you can travel back in time – LeandroHumb Dec 27 '19 at 17:59
  • we don't need to do this anymore after the release of deltalake 0.7.0. we can use SQL now as i suggested in my next response to Alter/update a table. – mahmoud mehdi Nov 06 '20 at 10:46
  • 4
    Syntax has changed, so now you should look to Enayat's answer below. – haaduken Jul 02 '21 at 17:42
  • If you're using a recent runtime in Databricks, this is no longer the case. See Ispan Cristi's answer that uses a simple ALTER TABLE statement. – David Maddox Oct 07 '22 at 19:07
8

To be able to rename the column, overwriteSchema with saveAsTable should be used:

spark.read.table(Table_Name)
  .withColumnRenamed("currentName", "newName")
  .write
  .format("delta")
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable("Table_Name")
Enayat
  • 3,904
  • 1
  • 33
  • 47
5

Recently has been published some modifications which allow to rename columns on DELTA TABLES in Databricks.

It is needed to set this properties on table:

ALTER TABLE <table_name> SET TBLPROPERTIES (
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)

Afterwards , you can rename the column as always.

ALTER TABLE <table_name> RENAME COLUMN old_col_name TO new_col_name 

Check this: https://docs.databricks.com/delta/delta-column-mapping.html

Other usefull links:

https://docs.databricks.com/delta/delta-batch.html#rename-columns-1

https://docs.databricks.com/delta/delta-batch.html#change-column-type-or-name

Cristian Ispan
  • 571
  • 2
  • 5
  • 23
  • This does not work for me. When setting the TBLPROPOERTIES, I get the following error message : Error in SQL statement: ParseException: no viable alternative at input 'ALTER TABLE '/my_dir/my_table''. Any idea why ? – TOMC Oct 07 '22 at 07:45
  • One of the requirements is to run on Databricks Runtime 10.2 or above. Check it. Thanks! – Cristian Ispan Oct 10 '22 at 07:50