5

Using the Azure Search .net SDK, when you try to index documents you might get an exception IndexBatchException.

From the documentation here:

        try
        {
            var batch = IndexBatch.Upload(documents);
            indexClient.Documents.Index(batch);
        }
        catch (IndexBatchException e)
        {
            // Sometimes when your Search service is under load, indexing will fail for some of the documents in
            // the batch. Depending on your application, you can take compensating actions like delaying and
            // retrying. For this simple demo, we just log the failed document keys and continue.
            Console.WriteLine(
                "Failed to index some of the documents: {0}",
                String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key)));
        }

How should e.FindFailedActionsToRetry be used to create a new batch to retry the indexing for failed actions?

I've created a function like this:

    public void UploadDocuments<T>(SearchIndexClient searchIndexClient, IndexBatch<T> batch, int count) where T : class, IMyAppSearchDocument
    {
        try
        {
            searchIndexClient.Documents.Index(batch);
        }
        catch (IndexBatchException e)
        {
            if (count == 5) //we will try to index 5 times and give up if it still doesn't work.
            {
                throw new Exception("IndexBatchException: Indexing Failed for some documents.");
            }

            Thread.Sleep(5000); //we got an error, wait 5 seconds and try again (in case it's an intermitent or network issue

            var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
            UploadDocuments(searchIndexClient, retryBatch, count++);
        }
    }

But I think this part is wrong:

var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
richard
  • 12,263
  • 23
  • 95
  • 151

2 Answers2

6

The second parameter to FindFailedActionsToRetry, named keySelector, is a function that should return whatever property on your model type represents your document key. In your example, your model type is not known at compile time inside UploadDocuments, so you'll need to change UploadsDocuments to also take the keySelector parameter and pass it through to FindFailedActionsToRetry. The caller of UploadDocuments would need to specify a lambda specific to type T. For example, if T is the sample Hotel class from the sample code in this article, the lambda must be hotel => hotel.HotelId since HotelId is the property of Hotel that is used as the document key.

Incidentally, the wait inside your catch block should not wait a constant amount of time. If your search service is under heavy load, waiting for a constant delay won't really help to give it time to recover. Instead, we recommend exponentially backing off (e.g. -- the first delay is 2 seconds, then 4 seconds, then 8 seconds, then 16 seconds, up to some maximum).

Bruce Johnston
  • 8,344
  • 3
  • 32
  • 42
  • Thanks Bruce. I see that it worked. I've changed my code to this:var retryBatch = e.FindFailedActionsToRetry(batch, searchDoc => searchDoc.id); – richard Oct 19 '16 at 05:16
  • Ironically I had my code exponentially backing off but for this post and simplicity I changed it to just flat 5 seconds. I'll change it again. How many retries with exponential increases would you recommend? I have mine set at 5 currently. – richard Oct 19 '16 at 05:19
  • 2
    You could keep retrying as long as you're making progress (batch has fewer items than on the last Index call), and cap the number of retries only when you don't make progress. In that case, max retry count should be based on how long you're willing to wait since the delay exponentially increases. Past a certain point you could switch from exponential to constant delay (e.g. after the delay reaches a few minutes, or whatever you find works for you). – Bruce Johnston Oct 19 '16 at 09:56
  • Any way that this can be tested? Like making the index operation fail in between the 1000 records to see if an exception is thrown? – Kevin Cohen Aug 17 '17 at 18:17
  • @KevinCohen Have you tried mocking IDocumentsOperations.IndexWithHttpMessagesAsync? https://learn.microsoft.com/dotnet/api/microsoft.azure.search.idocumentsoperations?view=azure-dotnet – Bruce Johnston Aug 17 '17 at 22:16
  • @BruceJohnston Hi Bruce, could you elaborate more on your comment? I know this function exists but how could I use it to force the indexing to fail? put random headers? – Kevin Cohen Aug 18 '17 at 16:54
  • @KevinCohen What is your goal in terms of test coverage? Are you trying to exercise the code you've written to handle IndexBatchException, and you just want to provoke one? Or are you trying to do an actual end-to-end "chaos monkey" type of test? – Bruce Johnston Aug 18 '17 at 19:54
  • I wanted to provoke one. But got help suggesting that I could use the merge (without having the document already in azure search) and so this would fail. Now I am stuck in the retry. Could you take a look at this? https://stackoverflow.com/questions/45764070/azure-search-findfailedactionstoretry-returning-empty-actions – Kevin Cohen Aug 18 '17 at 19:57
  • Consider using a [Polly](https://www.nuget.org/packages/polly) retry policy to control retry delays and other such factors. – bugged87 Mar 17 '20 at 19:23
  • There are mechanisms in the SDK for retry handling, although they don't apply to 207 responses. We're currently working on a new .NET SDK that should handle more of these partial failure cases for you, although that's several months out at least. – Bruce Johnston Mar 17 '20 at 23:25
1

I've taken Bruce's recommendations in his answer and comment and implemented it using Polly.

  • Exponential backoff up to one minute, after which it retries every other minute.
  • Retry as long as there is progress. Timeout after 5 requests without any progress.
  • IndexBatchException is also thrown for unknown documents. I chose to ignore such non-transient failures since they are likely indicative of requests which are no longer relevant (e.g., removed document in separate request).
int curActionCount = work.Actions.Count();
int noProgressCount = 0;

await Polly.Policy
    .Handle<IndexBatchException>() // One or more of the actions has failed.
    .WaitAndRetryForeverAsync(
        // Exponential backoff (2s, 4s, 8s, 16s, ...) and constant delay after 1 minute.
        retryAttempt => TimeSpan.FromSeconds( Math.Min( Math.Pow( 2, retryAttempt ), 60 ) ),
        (ex, _) =>
        {
            var batchEx = ex as IndexBatchException;
            work = batchEx.FindFailedActionsToRetry( work, d => d.Id );

            // Verify whether any progress was made.
            int remainingActionCount = work.Actions.Count();
            if ( remainingActionCount == curActionCount ) ++noProgressCount;
            curActionCount = remainingActionCount;
        } )
    .ExecuteAsync( async () =>
    {
        // Limit retries if no progress is made after multiple requests.
        if ( noProgressCount > 5 )
        {
            throw new TimeoutException( "Updating Azure search index timed out." );
        }

        // Only retry if the error is transient (determined by FindFailedActionsToRetry).
        // IndexBatchException is also thrown for unknown document IDs;
        // consider them outdated requests and ignore.
        if ( curActionCount > 0 )
        {
            await _search.Documents.IndexAsync( work );
        }
    } );
Steven Jeuris
  • 18,274
  • 9
  • 70
  • 161