for elem in list:
final = sqlCtx.read.table('XXX.YYY')
interim = final.join(elem,'user_id', "fullouter")
final = interim.select(['user_id'] + [
spark_combine_first(final[c], elem[c]).alias(c) for c in dup_collect(interim.columns)[0] if c not in ['user_id']] + \
[c for c in dup_collect(interim.columns)[1] if c not in ['user_id']])
final.write.mode("overwrite").saveAsTable("XXX.temp_test")
final2 = sqlCtx.read.table('XXX.temp_test')
final2.write.mode("overwrite").saveAsTable("XXX.YYY")
This is my mock code, as you can see I am reading from a table and then finally writing to the same table on Hadoop servers, but I get an error that the table can't be overwritten when reading from the same table.
I have found a temporary work around for it (by writing to a temporary table and then, import it to a new DataFrame and finally write to the required table) but, this seems like super inefficient.
I was hoping for another approach by which I can simply rename the temp_table created from within the spark API but have not found much success.
PS: Please ignore the indentation, I cant seem to get the right formatting here.