2

there is add_files() to add some files from hive table to iceberg. but cannot find a way to reverse that operation other than drop the table and recreate.

CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => 'db.src_tbl',
partition_filter => map('date', '2023-03-16', 'hour', '12')

every thing works expected till this step, but now if i want to add all files belongs to 2023-03-16 now it will complain some files is duplicate.

java.lang.IllegalStateException: 
Cannot complete import because data files to be imported already exist within the target table: 
.../part-00000-d9d0137c-d7d6-46f5-b78a-9f68b977c7af.c000.zstd.parquet.  
This is disabled by default as Iceberg is not designed for multiple references to the same file within the same table.  
If you are sure, you may set 'check_duplicate_files' to false to force the import.

obviously don't want to add duplicate either. is there a solution?

Dyno Fu
  • 8,753
  • 4
  • 39
  • 64
  • Did you try to rollback the table to the previous snapshot? – shay__ Mar 19 '23 at 11:10
  • yes. that is what did. not straightforward but yes work with some tweak. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1679005212089589 – Dyno Fu Mar 20 '23 at 03:05

1 Answers1

0

summary from the community slack thread.

Dyno Fu
  • 8,753
  • 4
  • 39
  • 64