How to solve the following issue in Spark 3.0? Can not create the managed table. The associated location already exists.;

Question

In my spark job, I tried to overwrite a table in each microbatch of structured streaming

batchDF.write.mode(SaveMode.Overwrite).saveAsTable("mytable")

It generated the following error.

  Can not create the managed table('`mytable`'). The associated location('file:/home/ec2-user/environment/spark/spark-local/spark-warehouse/mytable') already exists.;

I knew in Spark 2.xx, the way to solve this issue is to add the following option.

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

It works well in spark 2.xx. However, this option was removed in Spark 3.0.0. Then, how should we solve this issue in Spark 3.0.0?

Thanks!

Please try to explicitly specify the path where you're going to save with the 'overwrite' mode. — John Thomas, Sep 19 '20 at 17:33
Thanks John, I can confirm it works by adding a path in Spark 3.0. The way I add the path is as following. ```batchDF.write.mode(SaveMode.Overwrite).option("path", "/home/ec2-user/environment/spark/spark-local/tmp").saveAsTable("mytable")```. I am deploying in the standalone mode. Do you also have some comments on what is the correct path to use if I want to deploy it into hadoop cluster? Thanks! — yyuankm, Sep 21 '20 at 23:56
does the "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true" also deletes the remaining files? otherwise, you might get a mix of old and new files. — Hanan Shteingart, Jan 28 '21 at 12:28
I can confirm that this works: I am loading json format local hive tables in integration tests and have specified the same directory that was being used by default ( which is in my ide): now it doesn't fail if the files already exist. — stephen newman, Feb 19 '21 at 01:52

score 0 · Answer 1 · answered Oct 28 '21 at 15:03

It looks like you run your test data generation and your actual test in the same process - can you just replace these with createOrReplaceTempView to save them to Spark's in-memory catalog instead of into a Hive catalog?

Something like : batchDF.createOrReplaceTempView("mytable")

How to solve the following issue in Spark 3.0? Can not create the managed table. The associated location already exists.;

1 Answers1