3

I'm sending many messages using cloudQueue.BeginAddMessage and EndAddMessage. I'm limiting the amount of begins that haven't returned yet to 500. Yet I'm getting an exception with code 10048 (meaning socket exhaustion).

Microsoft.WindowsAzure.Storage.StorageException: Unable to connect to the remote server ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted

Solution I found after search all advise to modify registry, however as this is planned in a worker role in Azure, I can't do that.

I have other functions that inserts to table service, they operate just as fast but does not have any problems. It seems almost like the EndAddMessage function doesn't close the connection or something (I have limited understanding of sockets).

My question: is there a bug on azure's side here? What should I do to fix this except artificially slowing the adding of messages down to a crawl?

Here's the test function I use to send messages. In my case, after about 16500 messages being added and callback ended properly and stable, it slows down and after a little while throws the mentioned exception.

I am sorry for the long code, but this should be copy paste for you to reproduce the problem.

The exception is thrown from AsyncCallback endAddCallback.

    static void Main()
    {
        Console.SetBufferSize(205, Int16.MaxValue - 1);

        // Set the maximum number of concurrent connections (12*6 in my case)
        ServicePointManager.DefaultConnectionLimit = 12 * Environment.ProcessorCount;
        //setting UseNagleAlgorithm to true reduces network traffic by buffering small packets of data and transmitting them as a single packet, but setting to false can significantly reduce latencies for small packets.
        ServicePointManager.UseNagleAlgorithm = false;
        //if true, "Expect: 100-continue" header is sent to ensure a call can be made. This uses an entire roundtrip to the service point (azure), so setting to false sends the call directly.
        ServicePointManager.Expect100Continue = false;

        CloudStorageAccount storageAccount = CloudStorageAccount.Parse(__CONN_STRING);
        CloudQueueClient client = storageAccount.CreateCloudQueueClient();
        CloudQueue queue = client.GetQueueReference(__QUEUE_NAME);
        queue.CreateIfNotExists();
        List<Guid> ids = new List<Guid>();
        for (Int32 i = 0; i < 40000; i++)
            ids.Add(Guid.NewGuid());

        SendMessages(queue, ids.Select(id => new CloudQueueMessage(id.ToString())).ToList().AsReadOnly());
    }

    public static void SendMessages(CloudQueue queue, IReadOnlyCollection<CloudQueueMessage> messages)
    {
        List<CloudQueueMessage> toSend = messages.ToList();
        Object exceptionSync = new Object();
        Exception exception = null;
        CountdownEvent cde = new CountdownEvent(toSend.Count);
        AsyncCallback endAddCallback = asyncResult =>
        {
            Int32 endedItem = (Int32)asyncResult.AsyncState;
            try
            {
                queue.EndAddMessage(asyncResult);
                Console.WriteLine("SendMessages: Ended\t\t{0}\t/{1}", endedItem + 1, toSend.Count);
            }
            catch (Exception e)
            {
                Console.WriteLine("SendMessages: Error adding {0}/{1} to queue: \n{2}", endedItem + 1, toSend.Count, e);
                lock (exceptionSync)
                {
                    if (exception == null)
                        exception = e;
                }
            }
            finally { cde.Signal(); }
        };

        for (Int32 i = 0; i < toSend.Count; i++)
        {
            lock (exceptionSync)
            {
                if (exception != null)
                    throw exception;
            }
            //if number of added but not ended is larger than the MAX, yield and check again.
            while (true)
            {
                Int32 currentOngoing = (i- (cde.InitialCount - cde.CurrentCount));
                if (currentOngoing > 500)
                    Thread.Sleep(5);
                else
                    break;
            }
            Console.WriteLine("SendMessages: Beginning\t{0}\t/{1}", i + 1, toSend.Count);
            queue.BeginAddMessage(toSend[i], endAddCallback, i);
        }

        cde.Wait();
        if (exception != null)
            throw exception;
        Console.WriteLine("SendMessages: Done.");
    }
David S.
  • 5,965
  • 2
  • 40
  • 77
  • 1
    A worker role in Azure should give you rights to modify some (if not all) keys through the OnStart event or through RDP.. What is the key you can't modify? – makerofthings7 Feb 26 '13 at 15:38
  • I wasn't aware that I could modify the registry in a worker role? The keys are HKLM\System\CurrentControlSet\Services\Tcpip\Parameters (values MaxUserPort and TCPTimeWaitDelay). As suggested in this answer and other places http://stackoverflow.com/questions/1339142/wcf-system-net-socketexception-only-one-usage-of-each-socket-address-protoco – David S. Feb 26 '13 at 15:42
  • 1
    Yep it's possible to edit the registry. There are 2 ways: [A startup task, or the OnStart event](http://blogs.msdn.com/b/avkashchauhan/archive/2011/12/23/how-to-modify-registry-keys-in-windows-azure-virtual-machine-from-a-web-or-worker-role.aspx) – makerofthings7 Feb 26 '13 at 15:43
  • I will try this and report back, thanks. I don't know why I assumed it couldn't be done. However, it is weird that I don't get this error with table client, even though it's very fast as well. – David S. Feb 26 '13 at 15:47
  • In any case I should remark that this will move or make the problem less visible only (and use more resources), and that there still seems to be a bug where sockets aren't closed. – David S. Feb 26 '13 at 15:55
  • Agreed. From what I learned from using the TPL, there are two types of threads: IO threads and Process threads. Perhaps the Async implementation of Queues aren't using the same implementation as Tables. Regardless, I'd be interested in seeing your work on Tables if you are willing. I try to never stop learning, and think your code is interesting (I'm new to async too, but wrapping in TPL) – makerofthings7 Feb 26 '13 at 16:01
  • @makerofthings7 See my answer to Joe for an update. I can send you code for tables if you want, don't know how relevant it would be to paste it in this thread. – David S. Mar 04 '13 at 12:46
  • Any update on this issue? I'm seeing this with Azure websites posting to an Azure queue on Storage version 3.0.3 – Yoenhofen Apr 17 '14 at 14:48

3 Answers3

2

The Cloud[Blob|Table|Queue]Client does not maintain state, and can be used across many objects.

This issue is related to ServicePointManager becoming overloaded. Queue stress scenarios tend to exacerbate this behavior since they perform many small requests (in your case a guid which is quite small). There are a few mitigations you an do that should alleviate this issue

  • The Nagle algorithm is designed to help in this case by batching together small requests at the tcp layer, setting it to true may slightly increase per message latency in some cases, but under stress most likely this will be negligible as requests will not need to wait long to be larger than the window nagle is looking for (1400 bytes)
  • Increase ServicePointManager.DefaultConnectionLimit to account for rapid opening and closing of sockets.
  • Can you provide more information regarding where you code is running, and if there is any other code using connections while this is running. By default the client requests send keep alive = true, which should keep persistent connections to the Azure service and allow multiple requests to utilize the same socket without having to open / close/ reconnect.

Also, regarding your comment of table entities not showing the same behavior, The current wire protocol that the Table Service supports is Atom/Pub, which can be quite chatty (xml etc). Therefore a simple entity insert is much larger than a simple queue guid message. Essentially due to the size difference the table traffic is doing a better job utilizing the TCP layer below it, so this isn't a true apples to apples comparison.

If these solutions do not work for you, it would be helpful to get a few more pieces of information regarding your account so we can look at this on the back end.

joe

Joe Giardino
  • 364
  • 2
  • 4
  • Thanks Joe - I've now tried several things: Enable nagle algorithm had no noticeable effect, still get the exception. Increased the connection limit 5 times: still exception (maybe just a few seconds later than usual). Enabled both nagle and increasing connection limit: still exception as well. I also tried decreasing the TcpTimedWaitDelay registry key to 30 seconds. This had some effect, but when increasing instances of the sending application/sending to different queues simultaneously, the exception still appears very quickly. What information do you need about the account? – David S. Mar 04 '13 at 12:40
  • Oh, and code is running on my office computer. I've tried on several machines, also at home. All machines show thousands TIME_WAIT in netstat -n output. But just a few or even none when inserting to table storage. – David S. Mar 04 '13 at 12:45
0

I suspect it's because CloudQueueClient isn't meant for multithreaded (async) access as you're doing it.

Try recreating the CloudQueue in SendMessages like this

    CloudQueueClient client = storageAccount.CreateCloudQueueClient();
    CloudQueue queue = client.GetQueueReference(__QUEUE_NAME);

I've read in numerous forums that a CloudXXClient is meant to be used once and disposed. That principal might apply here.

There isn't much efficiency to be gained, as the ctor for the client doesn't send a request to the queue and has threading issues.

makerofthings7
  • 60,103
  • 53
  • 215
  • 448
  • Thanks for the reply. I tried to do what you said, for each iteration in the for loop, and also sent the created `queue` for the iteration as a state object to the callback so that I used it for the `EndAddMessage` function as well. Unfortunately it did not work, I get the same exception. – David S. Feb 26 '13 at 15:17
  • Can you try moving where CloudQueueMessage is being created? Perhaps there is some weird C# closure happening. Try creating that object in the function. – makerofthings7 Feb 26 '13 at 15:31
  • Tried it - even if I declare and instantiate each message in the loop, I get the same exception. Same for client/queue. I have no closure warnings from ReSharper either. – David S. Feb 26 '13 at 15:36
  • Are any of the objects Disposable and may require clean up? – makerofthings7 Feb 26 '13 at 15:40
0

This has now been solved in Storage Client Library 2.0.5.1.

Alternatively, there is also a workaround: uninstalling KB2750149.

David S.
  • 5,965
  • 2
  • 40
  • 77