Problem to merge multiple streamings in the same table on apache iceberg

Question

I have multiple spark streaming writing in the same table in diferents fields. The iceberg documentation said the following: Iceberg supports multiple concurrent writes using optimistic concurrency.

But the error message appear when trying to merge: Caused by: org.apache.iceberg.exceptions.ValidationException: Found conflicting files that can contain records matching true

Spark Merge:

spark.sql(
  f"""
  MERGE INTO datahub.replicacao.pefin_table tgt
  USING (select nu_documento, co_cadus, aud_enttyp, nu_particao from pefin_pf) src
  ON tgt.nu_documento = src.nu_documento and src.nu_particao in ('1', '2', '4')
  WHEN MATCHED AND src.aud_enttyp = 'D' THEN DELETE
  WHEN MATCHED THEN UPDATE SET *
  WHEN NOT MATCHED THEN INSERT *
""")

Spark Session Configs:

 val spark = SparkSession.builder()
.master("local[*]")
.config("spark.sql.catalog.datahub", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.catalog.datahub.type", "hadoop")
.config("spark.sql.catalog.datahub", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.datahub.warehouse", "file:///C:/dev/warehouse")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.getOrCreate()

did you get any resolution to this? I am also facing the same issue with concurrent deletes on iceberg table. — deGee, May 04 '23 at 04:53

Problem to merge multiple streamings in the same table on apache iceberg

0 Answers0