Is there a way to remove files belongs to a partition without physically delete them in iceberg?

Question

there is add_files() to add some files from hive table to iceberg. but cannot find a way to reverse that operation other than drop the table and recreate.

CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => 'db.src_tbl',
partition_filter => map('date', '2023-03-16', 'hour', '12')

every thing works expected till this step, but now if i want to add all files belongs to 2023-03-16 now it will complain some files is duplicate.

java.lang.IllegalStateException: 
Cannot complete import because data files to be imported already exist within the target table: 
.../part-00000-d9d0137c-d7d6-46f5-b78a-9f68b977c7af.c000.zstd.parquet.  
This is disabled by default as Iceberg is not designed for multiple references to the same file within the same table.  
If you are sure, you may set 'check_duplicate_files' to false to force the import.

obviously don't want to add duplicate either. is there a solution?

yes. that is what did. not straightforward but yes work with some tweak. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1679005212089589 — Dyno Fu, Mar 20 '23 at 03:05

score 0 · Accepted Answer · answered May 04 '23 at 14:49

0

summary from the community slack thread.

use snapshot management procedure, it's pretty convolved.
use delete from as it's metadata only delete if filter matches whole partition and won't touch the data.

answered May 04 '23 at 14:49

Dyno Fu

8,753
4
39
64

Is there a way to remove files belongs to a partition without physically delete them in iceberg?

1 Answers1