Fix inconsistent state right away or lazily when data is requested

Question

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:

Update/delete objects from Step#2 right away. This leads to:
- Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
- Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
- On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
- Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
- Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
- We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.

Anyone knows better approaches? Or maybe improvements to these two?

Update

I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.

This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.

A third option: you could run a batch job overnight to reconcile the state of any modified workflows. Then you achieve a sort of eventual consistency in the database. — jaco0646, Nov 18 '20 at 14:12
@jaco0646, well user can go to Step#2 right away, so even if we go with the job, we'll need to be able to correct the state on the next GET. Though we could combine the approaches and _ensure_ that experiments are in consistent state even if user left home without going to the next step.. — Stanislav Bashkyrtsev, Nov 18 '20 at 14:26

score 0 · Answer 1 · answered Nov 16 '20 at 19:50

0

I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.

There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.

answered Nov 16 '20 at 19:50

alex_noname

26,459
5
69
86

Not sure how this relates.. In your case you have a Context which has to keep things consistent. Problem stays the same - do you "invoke" the Context right away to make things consistent (and complicate PUT/PATCH operations) or do you do this lazily during next GET? – Stanislav Bashkyrtsev Nov 16 '20 at 21:36
I mean that data change should not occur when extracting data, but when creating – alex_noname Nov 17 '20 at 07:03

score 0 · Answer 2 · answered Nov 16 '20 at 22:38

What about dirty data asynchronous cleanup?

Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.

Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.

Async checks are great if data on Step#2 just gets outdated (we don't reference it) and it's easy to determine from DB columns. As you mentioned this becomes a Cleanup job. But my functionality is less straightforward - e.g. if something was _added_ on Step#1 I'd need to delete, update or _add_ things on Step#2, and it's hard to determine just from DB data. So Cleanup job isn't feasible. Running dirty checks upon each fetch as you suggested later - is actually pretty cheap operation compared to other stuff on Step#2. That's the reason why we chose it initially. — Stanislav Bashkyrtsev, Nov 17 '20 at 09:19

score 0 · Answer 3 · answered Nov 17 '20 at 03:04

It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:

For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.

So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.

There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.

One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.

In summary, I would strongly consider the Observer design pattern.

Yes, Observer unties 2 pieces of functionality in compile-time. The only downside left (and I forgot to mention it) is that in order to determine if Step#2 needs updates we need to load its data. Meaning that no matter whether we actually want to update something on Step#2 we'll have to do extra work which slows down all operations on Step#1. Well, unless this information comes along from UI... But this ties the functionality back. On a different level though. — Stanislav Bashkyrtsev, Nov 17 '20 at 09:10
The updates can be asynchronous. Of course that adds complexity of its own, but it's possible to avoid the performance hit on step #1. — jaco0646, Nov 17 '20 at 19:41
Caching is another solution to the performance problem, and may be simpler than asynchronous logic. — jaco0646, Nov 18 '20 at 16:19

Fix inconsistent state right away or lazily when data is requested

Update

3 Answers3