11

I have an API (.NET Core 2.2) which retrieves documents from Cosmos DB using SDK v3.5.0. Currently some requests are throwing an exception due to timeouts on requests to Cosmos DB - the response is 408 status code. It's worth mentioning that the 90% of the requests are being processed successfully.

Using Telemetry to log the activity of the API I noticed that in the dependencies table (which logs every request to Cosmos DB) the requests with > 60s of duration are initiated by a command named “Create/query document”. Those are the requests which falls in timeouts.

On the other hand, all the other requests use the command "Query documents" which responds in < 5secs and the requests finish successfully. In order to share more context, I detailed below the error stack trace, general specifications and how I query the documents.

Stack Trace

    Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: 408 Substatus: 0 Reason: (Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: 408 Substatus: 0 Reason: (Microsoft.Azure.Documents.RequestTimeoutException: GatewayStoreClient Request Timeout. Start Time:1/30/2020 4:18:00 AM; Total Duration:00:01:05.0332130; Http Client Timeout:00:01:05; Activity id: 7498789a-8e09-4c3e-96a6-31c32e4dc2d7; Inner Message: The operation was canceled.;, Request URI: /dbs/production/colls/Announcement/docs, RequestStats: , SDK: Windows/10.0.14393 cosmos-netstandard-sdk/3.4.2 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request. ---> System.Net.Sockets.SocketException: The I/O operation has been aborted because of either a thread exit or an application request
       --- End of inner exception stack trace ---
       at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
       at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
       at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
       at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
       at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
       --- End of inner exception stack trace ---
       at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
       at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
       at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
       at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
       at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
       at Microsoft.Azure.Cosmos.DocumentClient.HttpRequestMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
       at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
       at Microsoft.Azure.Cosmos.GatewayStoreClient.<>c__DisplayClass14_0.<<InvokeClientAsync>b__0>d.MoveNext()
       --- End of inner exception stack trace ---
       at Microsoft.Azure.Cosmos.GatewayStoreClient.<>c__DisplayClass14_0.<<InvokeClientAsync>b__0>d.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---
       at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
       at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)
       at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
       at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken)
       at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken)
       at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)
       at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)).
    StatusCode = RequestTimeout;
    SubStatusCode = 0;
    ActivityId = 7498789a-8e09-4c3e-96a6-31c32e4dc2d7;
    RequestCharge = 0;
    {"ActivityId":"7498789a-8e09-4c3e-96a6-31c32e4dc2d7","StatusCode":408,"SubStatusCode":0,"RequestCharge":0.0,"ErrorMessage":"Microsoft.Azure.Documents.RequestTimeoutException: GatewayStoreClient Request Timeout. Start Time:1/30/2020 4:18:00 AM; Total Duration:00:01:05.0332130; Http Client Timeout:00:01:05; Activity id: 7498789a-8e09-4c3e-96a6-31c32e4dc2d7; Inner Message: The operation was canceled.;, Request URI: /dbs/production/colls/Announcement/docs, RequestStats: , SDK: Windows/10.0.14393 cosmos-netstandard-sdk/3.4.2 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request. ---> System.Net.Sockets.SocketException: The I/O operation has been aborted because of either a thread exit or an application request\r\n   --- End of inner exception stack trace ---\r\n   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)\r\n   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)\r\n   at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)\r\n   at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)\r\n   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   --- End of inner exception stack trace ---\r\n   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   at Microsoft.Azure.Cosmos.DocumentClient.HttpRequestMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)\r\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.<>c__DisplayClass14_0.<<InvokeClientAsync>b__0>d.MoveNext()\r\n   --- End of inner exception stack trace ---\r\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.<>c__DisplayClass14_0.<<InvokeClientAsync>b__0>d.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)\r\n   at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\r\n   at Microsoft.Azure.Documents.BackoffRetryUtility`1.ExecuteRetryAsync(Func`1 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)\r\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeClientAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancel

Additional information

  • API Specifications
    • App Service deployed in West Us, West Europe, Southeast Asia and Brazil South regions
    • .NET CORE 2.2
    • Microsoft.Azure.Cosmos v3.5.0 SDK
    • CosmosDB client connection
    • Connection mode: Direct
    • Application Region: West US
    • Default values for the rest
    • Write region -> West US
    • Read regions -> West Us, West Europe, Southeast Asia and Brazil South
  • As the documentation suggests, I have only one connection to Cosmos DB for the entire application
  • To retrieve documents, I’m using the following implementation:
    var feed = container.GetItemLinqQueryable<T>(false, null, queryRequestOptions).Where(predicate).ToFeedIterator();
    var batches = new List<FeedResponse<T>>();
    while (feed.HasMoreResults)
    {
        var batch = await feed.ReadNextAsync();
        batches.Add(batch);
    }

Juanjo
  • 111
  • 1
  • 4
  • 1
    You can try to refer to https://learn.microsoft.com/en-us/azure/cosmos-db/troubleshoot-dot-net-sdk#request-timeouts to fix the issue. – Jim Xu Feb 06 '20 at 01:34
  • 1
    We have same issues when we have a bunch of requests at once. Originally we used DirectMode with different error "Response status code does not indicate success: 503 Substatus: 0 Reason: (Microsoft.Azure.Documents.ServiceUnavailableException: The request failed because the client was unable to establish connections to 4 endpoints across 1 regions. Please check for client resource starvation issues and verify connectivity between client and server.". Then we tried Gateway mode with the same result as you have. – Pavel Cermak Feb 26 '20 at 12:44
  • Are you going through Apigee? – TyngeOfTheGinge Mar 17 '20 at 19:55
  • @TyngeOfTheGinge no, I'm not going through Apigee. – Juanjo Mar 23 '20 at 18:44
  • @Juanjo what came of this? We are experiencing this exception, with DirectMode enabled. – gorillapower Jul 24 '20 at 09:00
  • @gorillapower I never known the cause of these exceptions. They occurred in PROD environment, and as a software vendor I never had access to Cosmos DB's metrics for that environment. I was not able to see if there was any clue to know what was the cause of. – Juanjo Jul 27 '20 at 23:04
  • 1
    Maybe too late but we also faced such issues. We get ```408```, ```410```, ```503``` "sometimes" :( ... After testing everything we figured out that it always occured within a high load scenario when the Cosmos Container is partitioning **physically** the only solution was to reinstanciate the ```CosmosClient``` ([Polly](https://www.nuget.org/packages/Polly/)). IMO the client then is doing some new connections or physical path discoveries. – Martin Oct 21 '20 at 22:16

0 Answers0