Azure Databricks - Can not create the managed table The associated location already exists

Question

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table:

SomeData_df.write.mode('overwrite').saveAsTable("SomeData")

I get the following error:

"Can not create the managed table('SomeData'). The associated location('dbfs:/user/hive/warehouse/somedata') already exists.;"

I used to fix this problem by running a %fs rm command to remove that location but now I'm using a cluster that is managed by a different user and I can no longer run rm on that location.

For now the only fix I can think of is using a different table name.

What makes things even more peculiar is the fact that the table does not exist. When I run:

%sql
SELECT * FROM SomeData

I get the error:

Error in SQL statement: AnalysisException: Table or view not found: SomeData;

How can I fix it?

Can you use `dbutils.fs.rm("dbfs:/user/hive/warehouse/Somedata/", true)` before saving instead of `%fs rm`? — char, Apr 01 '19 at 08:48
Sure thing. I added it as an answer too, if you want to close the question. — char, Apr 02 '19 at 07:33

score 22 · Answer 1 · answered Apr 01 '19 at 13:14

22

Seems there are a few others with the same issue.

A temporary workaround is to use

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData/", true)

to remove the table before re-creating it.

answered Apr 01 '19 at 13:14

char

2,063
3
15
26

Thanks, this solution worked for me, but I had to capitalize True. Was using the community edition of databricks if that is of any consequence. – DBAYoder Apr 02 '23 at 03:51
`overwrite` has a meaning that it overwrites. So why delete it? – Blue Clouds Jun 29 '23 at 08:51

score 11 · Answer 2 · answered Apr 24 '20 at 00:36

11

This generally happens when a cluster is shutdown while writing a table. The recomended solution from Databricks documentation:

This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook

%py
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

answered Apr 24 '20 at 00:36

Mike

384
3
9

This option is removed in Spark 3.0, any idea on the above behavior reported in the question? – Manoj Pandey Aug 01 '20 at 17:13

score 9 · Answer 3 · answered Jan 07 '21 at 02:24

9

All of the other recommended solutions here are either workarounds or do not work. The mode is specified as overwrite, meaning you should not need to delete or remove the db or use legacy options.

Instead, try specifying the fully qualified path in the options when writing the table:

df.write \
    .option("path", "hdfs://cluster_name/path/to/my_db") \
    .mode("overwrite") \
    .saveAsTable("my_db.my_table")

answered Jan 07 '21 at 02:24

Brendan

1,905
2
19
25

3

adding path fixed my issue, thank you Brendan – ChrisDanger Aug 24 '21 at 15:19

David Maddox · Answer 4 · 2020-01-06T18:20:44.617

5

For a more context-free answer, run this in your notebook:

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData", recurse=True)

Per Databricks's documentation, this will work in a Python or Scala notebook, but you'll have to use the magic command %python at the beginning of the cell if you're using an R or SQL notebook.

edited Jan 06 '20 at 18:20

answered Jan 03 '20 at 16:01

David Maddox

1,884
3
21
32

Blue Clouds · Answer 5 · 2023-08-15T08:57:03.030

This is caused by restarting the kernal when a write operations is going on, remove the file if drop is not working.

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData", recurse=True)

or like ~Mike said

This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

But this was removed in 3.0 version

score -1 · Answer 6 · answered Apr 18 '19 at 09:39

-1

I have the same issue, I am using

create table if not exists USING delta

If I first delete the files lie suggested, it creates it once, but second time the problem repeats, It seems the create table not exists does not recognize the table and tries to create it anyway

I don't want to delete the table every time, I'm actually trying to use MERGE on keep the table.

answered Apr 18 '19 at 09:39

OHabushi

14
2

This happened for me when trying to create a managed table using CTAS from another table. I created a new cell in databricks notebook and used this configuration and it worked : `%python spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")` . Further details can be see in the Databricks knowledge base : "https://kb.databricks.com/jobs/spark-overwrite-cancel.html" – Vivek Apr 25 '22 at 02:06

score -2 · Answer 7 · answered Sep 10 '20 at 14:20

Well, this happens because you're trying to write data to the default location (without specifying the 'path' option) with the mode 'overwrite'. Like said Mike you can set "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true", but this option was removed in Spark 3.0.0. If you try to set this option in Spark 3.0.0 you will get the following exception:

Caused by: org.apache.spark.sql.AnalysisException: The SQL config 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation' was removed in the version 3.0.0. It was removed to prevent loosing of users data for non-default value.;

To avoid this problem you can explicitly specify the path where you're going to save with the 'overwrite' mode.

there is already overwrite in the question – Blue Clouds Jul 14 '23 at 07:52 — Blue Clouds, Jul 14 '23 at 07:52

Azure Databricks - Can not create the managed table The associated location already exists

7 Answers7

Linked