"The associated location already exists" when saving a Spark DataFrame with mode('overwrite') set

Question

With mode('overwrite') set during a saveAsTable() operation:


df1.write.format('parquet').mode('overwrite').saveAsTable(
    'spark_no_bucket_table1')

Then why does saving a table fail?

pyspark.sql.utils.AnalysisException: Can not create the managed 
      table('`spark_no_bucket_table1`'). 
The associated location('file:experiments/spark-warehouse/spark_no_bucket_table1') 
   already exists.

score 4 · Accepted Answer · answered Nov 15 '22 at 07:30

4

From Spark's 2.4.0 migration guide:

Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set true to spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation restores the previous behavior. This option will be removed in Spark 3.0.

So if you use Spark in version >= 2.4.0 and < 3.0.0, you can solve it by setting:

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

For Spark version > 3.0.0, you will have to manually clean up the data directory specified in the error message.

answered Nov 15 '22 at 07:30

Gabio

9,126
3
12
32

So then what is the purpose of `mode('overwrite')` ? – WestCoastProjects Nov 15 '22 at 07:33
It seems that it doesn't have an actual meaning (for `saveAsTable`) but I am not sure. – Gabio Nov 15 '22 at 07:41
In other answer i found that it may work if you pass full path instead of table name. https://stackoverflow.com/questions/55380427/azure-databricks-can-not-create-the-managed-table-the-associated-location-alre Here its on databricks but maybe it will also work in your case (check answer from Jan 7, 2021 at 2:24) – M_S Nov 15 '22 at 08:19

"The associated location already exists" when saving a Spark DataFrame with mode('overwrite') set

1 Answers1