Is it possible to use Delta Live Tables to perform incremental batch processing?
Now, I believe that this code will always load all of the data available in the directory when a pipeline is run,
CREATE LIVE TABLE lendingclub_raw
COMMENT "The raw loan risk dataset, ingested from /databricks-datasets."
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * FROM parquet.`/databricks-datasets/samples/lending_club/parquet/`
But, if we do,
CREATE LIVE TABLE lendingclub_raw
COMMENT "The raw loan risk dataset, ingested from /databricks-datasets."
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * cloud_files("/databricks-datasets/samples/lending_club/parquet/", "parquet")
Will it only load the incremental data each time it runs, if the pipeline is run in triggered mode?
I know that you can achieve batch incremental processing in Auto Loader by using the trigger mode .trigger(once=True)
or .trigger(availableNow=True)
and running the pipeline on a schedule.
Since you cannot exactly define a trigger in DLT, how will this work?