1

I need to either add new document with children or add child to already existing parent. The only way I know how to do it is ugly:

public async Task AddOrUpdateCase(params)
{
    try
    {
        await UpdateCase(params);
    }
    catch (RequestFailedException ex)
    {
        if (ex.Status != (int)HttpStatusCode.NotFound)
           throw;

       await AddCase(params);
    }
}

private async Task UpdateCase(params)
{
    // this line throws when document is not found
    var caseResponse = await searchClient.GetDocumentAsync<Case>(params.CaseId) 
    // need to add to existing collection
    caseResponse.Value.Children.Add(params.child);
}

I think there wouldn't be any problem if this document didn't contain collection. You cannot use MergeOrUpload if there are child collections. You need to load them from index and add element. Is there better way to do it?

Piotr Perak
  • 10,718
  • 9
  • 49
  • 86

1 Answers1

2

Azure Cognitive Search doesn't support partial updates to collection fields, so retrieving the entire document, modifying the relevant collection field, and sending the document back to the index is the only way to accomplish this.

The only improvement I would suggest to the code you've shown is to search for the documents you want to update instead of retrieving them one-by-one. That way, you can update them in batches. Index updates are much more expensive than queries, so to reduce overhead you should batch updates together wherever possible.

Note that if you have all the data needed to re-construct the entire document at indexing time, you can skip the step of retrieving the document first, which would be a big improvement. Azure Cognitive Search doesn't yet support concurrency control for updating documents in the index, so you're better off having a single process writing to the index anyway. This should hopefully eliminate the need to read the documents before updating and writing them back. This is assuming you're not using the search index as your primary store, which you really should avoid.

If you need to add or update items in complex collections often, it's probably a sign that you need a different data model for your index. Complex collections have limitations (see "Maximum elements across all complex collections per document") that make them impractical for scenarios where the cardinality of the parent-to-child relationship is high. For situations like this, it's better to have a secondary index that includes the "child" entities as top-level documents instead of elements of a complex collection. That has benefits for incremental updates, but also for storage utilization and some types of queries.

Bruce Johnston
  • 8,344
  • 3
  • 32
  • 42
  • I'm always updating only one document. It's case with 1 to many (not more than few) e-mails. So I created `Case` as main document and emails as nested collection. I will only be adding to `emails` collection and updating `Case` fields. I will need to search in email subjects, bodies, addresses and return `Case` with `emails` as a result. What do you think of this model? I haven't used Azure search before. – Piotr Perak Nov 30 '20 at 20:14
  • If there are few details in Case that aren't in the collection of emails, and you plan to insert/update individual emails frequently, I'd use a denormalized model where email is the document and Case details are repeated per email. On the other hand, if Case has many such details and they change frequently, then your current model is better. – Bruce Johnston Nov 30 '20 at 22:37
  • If I used denormalized model I would have to update all of the emails that belong to `Case` whenever `Case` field changes? – Piotr Perak Dec 01 '20 at 05:08
  • Yes, so it's a tradeoff. If both `Case` fields and `emails` fields are updated frequently, you could have two indexes, query both, and do a kind of "client-side join" of the results in your app, but then you'd have to choose which relevance score to use to order results (case or email), when what you probably want is a score that considers both. There's no "one-size-fits-all" solution here. – Bruce Johnston Dec 01 '20 at 18:44
  • I added a paragraph to my answer that explains a way to avoid the read/modify/write pattern that you're currently using. Hopefully that also helps. – Bruce Johnston Dec 01 '20 at 18:50