2

The iceberg documentation discusses using merge-on-read when deleting data. The documentation also refers to doing position deletes versus equality deletes. It seems straight forward to specify that I want merge-on-read in the table properties.

I've looked through the iceberg documentation and also found a half dozen external sites that talk about the pro's and con's of each method, but none of them describe how to specify position versus equality. Is this a table property? How do I choose a method?

I'm using spark 3.3 on EMR with scala/python

1 Answers1

0

You don't need to specify POS or EQ delete. These two delete methods are automatically selected within the engine based on different scenarios.

To better use iceberg, you may need to pay attention to the following:

  • Use merge-on-read or cory-on-write
  • Merge files by specified policy
  • Expired snapshots and data deletion

Hope it helps you.

liliwei
  • 294
  • 1
  • 8
  • Thanks for that clarification @liliwei. Is there a way to force EQ delete? Our deletes are super expensive right now when trying to implement GDPR/CCPA optouts on a large data set. I've done performance testing on merge-on-read versus copy-on-write, delete versus merge and I don't even care about snapshots yet because iceberg deletes are prohibitively expensive. – Peter Connolly Nov 07 '22 at 14:35