0

I am using Elasticsearch.NET (5.6) on ASP.NET API (.NET 4.6) on Windows, and try to publish to elasticsearch hosted on AWS (I have tried both 5.1.1 and 6, both same behaviour).

I have the following code which bulk index the documents to Elasticsearch. Image calling the below code block many times:

        var node = new System.Uri(restEndPoint);
        var settings = new ConnectionSettings(node);
        var lowlevelClient = new ElasticLowLevelClient(settings);

        var index = indexStart + indexSuffix;

        var items = new List<object>(list.Count() * 2);
        foreach (var conn in list)
        {
            items.Add(new { index = new { _index = index, _type = "doc", _id = getId(conn) } });
            items.Add(conn);
        }

        try
        {
            var indexResponse = lowlevelClient.Bulk<Stream>(items);
            if (indexResponse.HttpStatusCode != 200)
            {
                throw new Exception(indexResponse.DebugInformation);
            }

            return indexResponse.HttpStatusCode;
        }
        catch (Exception ex)
        {
            ExceptionManager.LogException(ex, "Cannot publish to ES");
            return null;
        }

It runs fine, can publish documents to Elasticsearch, but it only can run 80 times, after 80 times, it will always get exception:

# OriginalException: System.Net.WebException: The operation has timed out
   at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
   at System.Net.HttpWebRequest.GetRequestStream()
   at Elasticsearch.Net.HttpConnection.Request[TReturn](RequestData requestData) in C:\Users\russ\source\elasticsearch-net-5.x\src\Elasticsearch.Net\Connection\HttpConnection.cs:line 148

The most interesting part is that: I have tried to change the bulk size to be 200 or 30, and it turned out to be 16000 and 2400, meaning both ends up at 80 times. (Each document size is very similar)

Any ideas? Thanks

Xin
  • 33,823
  • 14
  • 84
  • 85
  • How big is the one request that you're trying to send i.e. how many items in the list, and what's the overall number of bytes within one request? – Russ Cam Jan 15 '18 at 04:57
  • Also, what version of NEST are you using? What version of Elasticsearch are you targeting? What OS and .NET framework version? – Russ Cam Jan 15 '18 at 04:59
  • @RussCam Thanks for asking that, I've just updated my question. – Xin Jan 15 '18 at 05:28
  • Do you have any idea why the lib just begins to timeout right after 80 times bulk operation? – Xin Jan 15 '18 at 05:29
  • I suspect the `Stream` returned from each call is not being disposed of, and that you're hitting the default connection limit (of 80) for the number of open concurrent connections: https://github.com/elastic/elasticsearch-net/blob/master/src/Elasticsearch.Net/Configuration/ConnectionConfiguration.cs#L35 – Russ Cam Jan 15 '18 at 05:45
  • Maybe assign the `indexResponse.HttpStatusCode` to a variable, dispose the `indexResponse.Body` stream, and return the variable? – Russ Cam Jan 15 '18 at 05:46
  • One thing also not being handled is that some items/operations can fail within a bulk request, but the response is still 200. You're not handling/checking to see if any items have failed, and taking appropriate action. The high level client, NEST, has properties (`ItemsWithErrors`) and methods (`BulkAll` with retry semantics) to help with this – Russ Cam Jan 15 '18 at 05:49
  • Thank you very much @RussCam I finally get the solution, which should make use of `VoidResponse` – Xin Jan 15 '18 at 23:31

1 Answers1

0

There is a connection limit (Also refer to comments from @RussCam under the question). So the real issue is that the Stream in the response holding the connections.

So the fix is either indexResponse.Body.Dispose (I haven't tried this one) or use VoidResponse: reportClient.BulkAsync<VoidResponse>(items); which does not require the response stream. I've tried the second and it works.

Xin
  • 33,823
  • 14
  • 84
  • 85
  • 1
    You can use `VoidResponse` here, but the implementation is still missing the handling of failed bulk operations. Is there a reason you're using the low level client as opposed to using the high level client, NEST? – Russ Cam Jan 16 '18 at 00:29
  • only reason is just because I only find the bulk index implementation in low level ones – Xin Jan 16 '18 at 06:14
  • 2
    The high level client has bulk APIs. Take a look at the tests: https://github.com/elastic/elasticsearch-net/blob/5.x/src/Tests/Document/Multiple/Bulk/BulkApiTests.cs and an example: https://stackoverflow.com/a/45300910/1831 – Russ Cam Jan 16 '18 at 06:25
  • Thank you. I've tried the high level client, the descriptor is good to use, and without the `80` issue. – Xin Jan 16 '18 at 23:50