I have multiple spark streaming writing in the same table in diferents fields. The iceberg documentation said the following: Iceberg supports multiple concurrent writes using optimistic concurrency.
But the error message appear when trying to merge: Caused by: org.apache.iceberg.exceptions.ValidationException: Found conflicting files that can contain records matching true
Spark Merge:
spark.sql(
f"""
MERGE INTO datahub.replicacao.pefin_table tgt
USING (select nu_documento, co_cadus, aud_enttyp, nu_particao from pefin_pf) src
ON tgt.nu_documento = src.nu_documento and src.nu_particao in ('1', '2', '4')
WHEN MATCHED AND src.aud_enttyp = 'D' THEN DELETE
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
""")
Spark Session Configs:
val spark = SparkSession.builder()
.master("local[*]")
.config("spark.sql.catalog.datahub", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.catalog.datahub.type", "hadoop")
.config("spark.sql.catalog.datahub", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.datahub.warehouse", "file:///C:/dev/warehouse")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.getOrCreate()