I am using Pyspark and want to insert-overwrite partitions into a existing hive table.
- in this use case
saveAsTable()
is not suitable, it overwrites the whole existing table insertInto()
is behaving strangely: I have 3 partition levels, but it is inserting one
Snd what is the right way to use save()
?
Can save()
take options like database-name and table name to insert into, or only HDFS path?
example :
df\
.write\
.format('orc')\
.mode('overwrite)\
.option('database', db_name)\
.option('table', table_name)\
.save()