Event sourcing: denormalizing relationships in projections

Question

I'm working on a CQRS/ES architecture. We run multiple asynchronous projections into the read stores in parallel because some projections might be much slower than others and we want to stay more in sync with the write side for the faster projections.

I'm trying to understand the approaches on how I can generate the read models and how much data-duplication this might entail.

Let's take an order with items as a simplified example. An order can have multiple items, each item has a name. Items and orders are separate aggregates.

I could either try to save the read models in a more normalized fashion, where I create an entity or document for each item and order and then reference them - or I maybe would like to save it in a more denormalized manner where I have an order which contains items.

Normalized

{
  Id: Order1,
  Items: [Item1, Item2]
}

{
  Id: Item1,
  Name: "Foosaver 9000"
}

{
  Id: Item2,
  Name: "Foosaver 7500"
}

Using a more normalized format would allow a single projection to process events that affect/effect item and orders and update the corresponding objects. It would also mean that any changes to the item name affect all orders. A customer might get a delivery note for different items than the corresponding invoice for example (so obviously that model might not be good enough and lead us to the same issues as denormalizing...)

Denormalized

{
  Id: Order1,
  Items: [
    {Id: Item1, Name: "Foosaver 9000"},
    {Id: Item2, Name: "Foosaver 7500"},
  ]
}

Denormalizing however would require some source where I can look up the current related data - such as the item. This means that I either have to transport all the information I might need in the event, or I'll have to keep track of the data that I source for my denormalization. This would also mean that I might have to do this once for each projection - i.e. I might need a denormalized ItemForOrder as well as a denormalized ItemForSomethingElse - both only containing the bare minimum properties that each of the denormalized entities or documents need (whenever they are created or modified).

If I would share the same Item in the read store, I could end up mixing item definitions from different points of time, because the projections for items and orders might not run at the same pace. In the worst case, the projection for items might not have yet created the item I need to source for its properties.

Generally, what approaches do I have when processing relationships from an event stream?

update 2016-06-17

Currently, I'm solving this by running a single projection per denormalised read model and its related data. If I have multiple read models that have to share the same related data, then I might put them into the same projection to avoid duplicating the same related data I need for the lookup.

These related models might even be somewhat normalised, optimised for however I have to access them. My projection is the only thing that reads and writes to them, so I know exactly how they are read.

// related data 
public class Item 
{
  public Guid Id {get; set;}
  public string Name {get; set;}
  /* and whatever else is needed but not provided by events */
}

// denormalised info for document
public class ItemInfo 
{
  public Guid Id {get; set;}
  public string Name {get; set;}
}

// denormalised data as document
public class ItemStockLevel
{
  public ItemInfo Item {get; set;} // when this is a document
  public decimal Quantity {get; set;}
}

// or for RDBMS
public class ItemStockLevel
{
  public Guid ItemId {get; set;}
  public string ItemName {get; set;}
  public decimal Quantity {get; set;}
}

However, the more hidden issue here is that of when to update which related data. This is heavily dependent on the business process.

For example, I wouldn't want to change the item descriptions of an order after it has been placed. I must only update the data that changed according to the business process when the projection processes an event.

Therefore, the argument could be made towards putting this information into the event (and using the data as the client sent it?). If we find that we need additional data later, then we might have to fall back to projecting the related data from the event stream and read it from there...

This could be seen as a similar issue for pure CQRS architectures: when do you update the denormalised data in your documents? When do you refresh the data before presenting it to the user? Again, the business process might drive this decision.

score 1 · Answer 1 · answered May 25 '16 at 15:44

1

First, I think you want to be careful in your aggregates about life cycles. In the usual shopping cart domain, the cart (Order) lifecycle spans that of the items. Udi Dahan wrote Don't Create Aggregate Roots, which I've found to mean that aggregates hold a reference to the aggregate that "created" them, rather than the other way around.

Therefore, I would expect the event history to look like

// Assuming Orders come from Customers
OrderCreated(orderId: Order1, customerId: Customer1)

ItemAdded(itemId: Item1, orderId: Order1, Name:"Foosaver 9000")

ItemAdded(itemId: Item2, orderId: Order1, Name:"Foosaver 7500")

Now, it's still the case that there are no guarantees here about ordering - that's going to depend on how the aggregates are designed in the write model, whether your event store linearizes events across different histories, and so on.

Notice that in your normalized views, you could go from the order to the items, but not the other way around. Processing the events I've described gives you that same limitation: instead of Orders with mysterious items, you have items with mysterious orders. Anybody who looks for an order either doesn't see it yet, sees it empty, or sees it with some number of items; and can follow links from those items to the key store.

Your normalized forms in your key value store don't need to change from your example; the projection responsible for writing the normalized form of orders needs to be smart enough to watch the item streams too, but its all good.

(Also note: we're eliding ItemRemoved here)

That's ok, but it misses on the idea that reads happen more often than writes. For hot queries, you are going to want the denormalized form available: the data in the store is the DTO that you are going to send in response to the query. For example, if the query were supporting a report on the order (no edits allowed), then you wouldn't need to send the item ids either.

{
    Title: "Your order #{Order1}",
    Items: [
        {Name: "Foosaver 9000"},
        {Name: "Foosaver 7500"}
    ]
}

One thing that you might consider is tracking the versions of the aggregates in question, so that when the user navigates from one view to the next -- rather than getting a stale projection, the query pauses waiting for the new projection to catch up.

For instance, if your DTO were hypermedia, then it might looks something like

{
    Title: "Your order #{Order1}",
    refreshUrl: /orders/Order1?atLeastVersion=20,
    Items: [
        {Name: "Foosaver 9000", detailsUrl: /items/Item1?atLeastVersion=7},
        {Name: "Foosaver 7500", detailsUrl: /items/Item2?atLeastVersion=9}
    ]
}

answered May 25 '16 at 15:44

VoiceOfUnreason

52,766
5
49
91

So this would boil down to having all the descriptions etc. I might need in the event - something which I'm hesitant to implement because I might not know all the fields I might need to consume later, plus it tightly couples the read and write side yet again. – urbanhusky May 25 '16 at 18:21
I don't think that's right. All the data you need needs to appear somewhere in the event stream used to create the projections, but I don't see why there would need to be anything that you weren't already putting into the event to rehydrate the aggregate? – VoiceOfUnreason May 25 '16 at 19:16
For rehydration I have all the properties I need on my aggregate - but the projections need data from multiple aggregates. Items (or rather articles) are separate aggregates because they do not exist only in the context of a single order (multiple orders might be for the same article). So I either have to track all them in my read store (with aforementioned concurrency/storage issues) - or find some other means to get the data I need for the denormalized projections – urbanhusky May 25 '16 at 20:11
I can't quite tell - have you considered the possibility that your projection generator can listen to multiple aggregate streams? If so, can you clarify what problem you foresee in that case? – VoiceOfUnreason May 25 '16 at 20:43
Yes, the generators listen to all events they need. Take the stream of events `[ArticleDefined(article1, "Foosaver 9000"), OrderCreated(order1, customer1), ItemAdded(order1, article1), ArticleRebranded(article1, "Supersaver 9000"]` - the order projection needs to know about the articles in order to denormalize it into the order. I guess I would need an `ArticleRepository` on the read side, query that when handling the `ItemAdded` event for the name and other details of the article I might need. My concerns are that I either need a repository per generator, or have inconsistencies... – urbanhusky May 25 '16 at 21:51
If you know the aggregates you need, and which versions, you have the option of reloading them from the event store when you go to build the projection...? – VoiceOfUnreason May 25 '16 at 23:39
That would be too tightly coupling the domain model to the the read model and kind of goes against CQRS with the separate read and write models. It would make the projections incredibly slow too - and very inconsistent (not just eventually consistent but actual wrong data - because write position might be at a completely different state than read) – urbanhusky May 26 '16 at 10:00
@urbanhusky The solution I've used in the past is exactly what you're proposing: to have an `ArticleRepository` (we called them simply "lookup tables") that would support the generation of the "final" read models. That seemed like a good solution and didn't bring any extra burden on the projections side… What are your concerns on going down this route? – gtramontina Jun 01 '16 at 14:49
My concerns are the data-duplication that might be needed for different read models that rely on the same data. Every read model projection might be at a different position in the event stream, so I'd have to track the related objects (i.e. articles) for each projection. (I'm also in a special situation, we have to deploy to 150 servers, so all storage cost is times 150). – urbanhusky Jun 01 '16 at 18:20

score 0 · Answer 2 · answered Nov 13 '17 at 11:11

0

I also had this problem and tried diferent things. I read this suggestion and while I did not tried it yet, I think it may be the best way to go. Just enrich events before publishing them.

answered Nov 13 '17 at 11:11

Narvalex

1,824
2
18
27

I do agree that events should contain all the information required to describe it. This might not be enough for all cases, but that should be an exception. Some read model projections might have to keep a local state for further enrichment of the read model though - you have to be careful that the local state is updated synchronously with the read model (i.e. read model at position X sees state at position X). However, that would only work when the event store has global ordering of course. – urbanhusky Nov 16 '17 at 10:22

Event sourcing: denormalizing relationships in projections

2 Answers2