24

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table:

SomeData_df.write.mode('overwrite').saveAsTable("SomeData")

I get the following error:

"Can not create the managed table('SomeData'). The associated location('dbfs:/user/hive/warehouse/somedata') already exists.;"

I used to fix this problem by running a %fs rm command to remove that location but now I'm using a cluster that is managed by a different user and I can no longer run rm on that location.

For now the only fix I can think of is using a different table name.

What makes things even more peculiar is the fact that the table does not exist. When I run:

%sql
SELECT * FROM SomeData

I get the error:

Error in SQL statement: AnalysisException: Table or view not found: SomeData;

How can I fix it?

BuahahaXD
  • 609
  • 2
  • 8
  • 24

7 Answers7

22

Seems there are a few others with the same issue.

A temporary workaround is to use

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData/", true)

to remove the table before re-creating it.

char
  • 2,063
  • 3
  • 15
  • 26
11

This generally happens when a cluster is shutdown while writing a table. The recomended solution from Databricks documentation:

This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook

%py
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
Mike
  • 384
  • 3
  • 9
9

All of the other recommended solutions here are either workarounds or do not work. The mode is specified as overwrite, meaning you should not need to delete or remove the db or use legacy options.

Instead, try specifying the fully qualified path in the options when writing the table:

df.write \
    .option("path", "hdfs://cluster_name/path/to/my_db") \
    .mode("overwrite") \
    .saveAsTable("my_db.my_table")
Brendan
  • 1,905
  • 2
  • 19
  • 25
5

For a more context-free answer, run this in your notebook:

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData", recurse=True)

Per Databricks's documentation, this will work in a Python or Scala notebook, but you'll have to use the magic command %python at the beginning of the cell if you're using an R or SQL notebook.

David Maddox
  • 1,884
  • 3
  • 21
  • 32
0

This is caused by restarting the kernal when a write operations is going on, remove the file if drop is not working.

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData", recurse=True)

or like ~Mike said

This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

But this was removed in 3.0 version

Blue Clouds
  • 7,295
  • 4
  • 71
  • 112
-1

I have the same issue, I am using

create table if not exists USING delta

If I first delete the files lie suggested, it creates it once, but second time the problem repeats, It seems the create table not exists does not recognize the table and tries to create it anyway

I don't want to delete the table every time, I'm actually trying to use MERGE on keep the table.

OHabushi
  • 14
  • 2
  • This happened for me when trying to create a managed table using CTAS from another table. I created a new cell in databricks notebook and used this configuration and it worked : `%python spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")` . Further details can be see in the Databricks knowledge base : "https://kb.databricks.com/jobs/spark-overwrite-cancel.html" – Vivek Apr 25 '22 at 02:06
-2

Well, this happens because you're trying to write data to the default location (without specifying the 'path' option) with the mode 'overwrite'. Like said Mike you can set "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true", but this option was removed in Spark 3.0.0. If you try to set this option in Spark 3.0.0 you will get the following exception:

Caused by: org.apache.spark.sql.AnalysisException: The SQL config 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation' was removed in the version 3.0.0. It was removed to prevent loosing of users data for non-default value.;

To avoid this problem you can explicitly specify the path where you're going to save with the 'overwrite' mode.

Alex
  • 580
  • 5
  • 8