How is the Unit of Work used with batch processing?

Question

When building a web application, it is a standard practice to use one Unit of Work per HTTP request, and flush all the changes and commit the transaction once after handling the request.

What approach is used when building a batch process? Is one Unit of Work instance used for the whole execution of the process, with transactions committed periodically at logical points?

This may be more of a discussion than a factual question, but I'm hoping to find out if there is a commonly-accepted "best practice" analogous to Session-per-Request.

In terms of 'batch process', do you mean a process that does lots of things in one go or a process that has a number of steps with indeterminate pauses in-between? — David Osborne, Jun 06 '18 at 15:14
It does lots of work in one go. For example, parses a large text file and performs a one or more CRUD operations for each line. — pnschofield, Jun 11 '18 at 17:01

score 3 · Answer 1 · answered Jun 06 '18 at 13:16

Unit Of Work is your business transaction. It is defined on scope of ISession. It should be shorter; but not too short. That is why, Session-Per-Request is recommended. With this, you can take advantage of various UoW features like tracking, first level cache, auto flushing etc. This may avoid some round trips to database and improve performance.

With very short ISession scope like Session-Per-Operation (No UoW at all), you miss all those benefits mentioned above.

With unnecessarily increasing ISession scope like Session-Per-Application OR grouping non related operations, you create many problems like invalid proxy state, increased memory usage etc.

Considering above, for batch processing, try to identify smaller UoW in your batch. If you can split batch in small UoW parts, go ahead with it. If you cannot split the batch, you have two ways to go:

Single ISession for entire batch:
If your batch processes same records over and over, then this may be useful. With delayed flushing, you will get some performance benefit.
Even if your batch processes each record only once, this may still benefit due to reduced flushes and saved round trips to database. Refer point 2 below.
New ISession for each operation in batch:
If your batch processes each record only once, then this may be better. I cannot say for sure as complete scenario is unknown.

Both have drawbacks mentioned above; better try to find out smaller UoW inside your batch.

For bulk read operations, IStatelessSession is better solution.

score 2 · Answer 2 · answered Jun 05 '18 at 21:26

The unit of work would be per request or shorter. (Lifetime scope) Longer-lived Contexts/Tx will lead to memory use and performance issues.

If it is a process that while a page is active the user is selecting records to be actioned on at a later stage in that session then I would consider tracking the selected IDs and applicable modifications client-side to be provided to the "act" step when triggered, or recording a simple batch record on the server session state or DB to associate the selected/modified entities. If stored in the DB there should be a date-time associated with the record and an automatic process to clean off any unprocessed batches that don't get finalized. (user abandons by closing browser for example.)

If it were a case of wanting to batch up records across many requests such as having web requests getting batched into groups of <=1000 or for a process to run every hour then I'd use a persistent data structure where the requests commit the data to the batch structure grouped by a batch run record which tracks the current state of the batch. 1. Check current batch status 2. If open, add/associate record to current batch. 3. If closed/processing, create new batch and associate record to new batch.

Interactions with the batch should be pessimistically locking so that the background process doesn't begin processing a batch while requests are being written.

The background batch process queries the batch to find the batch it should begin processing, update the status, process, and finalize the batch.

This particular use is for a batch process that works outside the scope of any kind of web application. It parses a large text file and performs a one or more CRUD operations for each line. You gain some performance by reusing an existing UoW, because a DbContext/NHibernate Session contains a first-level cache meaning that entities fetched by ID do not require another round-trip to the database. — pnschofield, Jun 11 '18 at 17:04
In my experience NHibernate caching will do a better job of a job like that than EF. I had a similar though to a process that was going through ~60k rows in my case to perform file operations on the data. Unfortunately with EF, having the context open to re-use previously loaded data in the file generation process, but noticed the per-file processing time creep up steadily even though the amount of data processed with each file didn't vary by much. (time per-file doubled by the end.) Changing that to use a shorter-lived context resulted in consistent per-file performance. — Steve Py, Jun 11 '18 at 22:33

How is the Unit of Work used with batch processing?

2 Answers2

Linked