I'm working on a CQRS/ES architecture. We run multiple asynchronous projections into the read stores in parallel because some projections might be much slower than others and we want to stay more in sync with the write side for the faster projections.
I'm trying to understand the approaches on how I can generate the read models and how much data-duplication this might entail.
Let's take an order with items as a simplified example. An order can have multiple items, each item has a name. Items and orders are separate aggregates.
I could either try to save the read models in a more normalized fashion, where I create an entity or document for each item and order and then reference them - or I maybe would like to save it in a more denormalized manner where I have an order which contains items.
Normalized
{
Id: Order1,
Items: [Item1, Item2]
}
{
Id: Item1,
Name: "Foosaver 9000"
}
{
Id: Item2,
Name: "Foosaver 7500"
}
Using a more normalized format would allow a single projection to process events that affect/effect item and orders and update the corresponding objects. It would also mean that any changes to the item name affect all orders. A customer might get a delivery note for different items than the corresponding invoice for example (so obviously that model might not be good enough and lead us to the same issues as denormalizing...)
Denormalized
{
Id: Order1,
Items: [
{Id: Item1, Name: "Foosaver 9000"},
{Id: Item2, Name: "Foosaver 7500"},
]
}
Denormalizing however would require some source where I can look up the current related data - such as the item. This means that I either have to transport all the information I might need in the event, or I'll have to keep track of the data that I source for my denormalization. This would also mean that I might have to do this once for each projection - i.e. I might need a denormalized ItemForOrder as well as a denormalized ItemForSomethingElse - both only containing the bare minimum properties that each of the denormalized entities or documents need (whenever they are created or modified).
If I would share the same Item in the read store, I could end up mixing item definitions from different points of time, because the projections for items and orders might not run at the same pace. In the worst case, the projection for items might not have yet created the item I need to source for its properties.
Generally, what approaches do I have when processing relationships from an event stream?
update 2016-06-17
Currently, I'm solving this by running a single projection per denormalised read model and its related data. If I have multiple read models that have to share the same related data, then I might put them into the same projection to avoid duplicating the same related data I need for the lookup.
These related models might even be somewhat normalised, optimised for however I have to access them. My projection is the only thing that reads and writes to them, so I know exactly how they are read.
// related data
public class Item
{
public Guid Id {get; set;}
public string Name {get; set;}
/* and whatever else is needed but not provided by events */
}
// denormalised info for document
public class ItemInfo
{
public Guid Id {get; set;}
public string Name {get; set;}
}
// denormalised data as document
public class ItemStockLevel
{
public ItemInfo Item {get; set;} // when this is a document
public decimal Quantity {get; set;}
}
// or for RDBMS
public class ItemStockLevel
{
public Guid ItemId {get; set;}
public string ItemName {get; set;}
public decimal Quantity {get; set;}
}
However, the more hidden issue here is that of when to update which related data. This is heavily dependent on the business process.
For example, I wouldn't want to change the item descriptions of an order after it has been placed. I must only update the data that changed according to the business process when the projection processes an event.
Therefore, the argument could be made towards putting this information into the event (and using the data as the client sent it?). If we find that we need additional data later, then we might have to fall back to projecting the related data from the event stream and read it from there...
This could be seen as a similar issue for pure CQRS architectures: when do you update the denormalised data in your documents? When do you refresh the data before presenting it to the user? Again, the business process might drive this decision.