3

We have a server application that communicates with clients via TCP sockets. After it runs for a few weeks it crashes with an NullReferenceException that can not be handled. I have been able to reproduce the exception with a very small console program, but it seems that there is unhandled exception in internal sockets threadpool. So I can not handle it with any try/catch blocks as it is not in my control.

Does anybody have any idea about this? Is it a framework bug or how can I catch the exception on the socket threadpool (so our application is not crashing) ? Here is the example code that is generating the exception, after a few iterations (3-10). It is important to know that the server is offline, so the socket is not being able to connect. It is used Visual studio 2010 and .Net framework 4.0.

internal class Program
{
    private static string host;

    private static Socket socket;

    private static void Main(string[] args)
    {
        Trace.Listeners.Add(new ConsoleTraceListener());

        AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);

        socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);

        host = "127.0.0.1";
        //aslo the problem is happening whe the host is other network ip address
        //host = "192.168.0.1";

        //when in other thread doesn not crash application
        //Task.Factory.StartNew(() => StartConnecting());

        //also crashing the application
        //Task.Factory.StartNew(() => StartConnecting(), TaskCreationOptions.LongRunning);

        //when it is regular thread the exception occurs
        ///*
        var thread = new Thread(new ThreadStart(StartConnecting));
        thread.Start();
        //*/

        //when it is blocking exception also occurs
        //StartConnecting();
        Console.WriteLine("Press any key to exit ...");
        Console.ReadKey();
    }

    private static void StartConnecting()
    {
        try
        {
            int count = 0;
            while (true)
            {
                try
                {
                    // if i must switch to Socket.Connect(...)?
                    Trace.WriteLine(string.Format("Connect Try {0} begin", ++count));

                    var ar = socket.BeginConnect(host, 6500, new AsyncCallback(ConnectCallback), socket);

                    Trace.WriteLine(string.Format("Connect Try {0} end", count));
                }
                catch (Exception err)
                {
                    Trace.WriteLine(string.Format("[BeginConnect] error {0}", err.ToString()));
                }
                System.Threading.Thread.Sleep(1000);
                //will see the exception more quick
            }
        }
        catch (Exception e)
        {
            Trace.WriteLine(string.Format("[StartConnecting] error {0}", e.ToString()));
        }
    }

    private static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
    {
        string msg = e.ExceptionObject.ToString();

        Trace.WriteLine(string.Format("[CurrentDomain_UnhandledException] isTerminating={0} error {1}", e.IsTerminating, msg));

        Trace.WriteLine("Exiting process");

        //the other processing threads continue working
        //without problems untill there is thread.sleep
        //Thread.Sleep(10000);
    }

    private static void ConnectCallback(IAsyncResult ar)
    {
        try
        {
            Trace.WriteLine("[ConnectCallback] enter");
            var socket = (Socket)ar.AsyncState;
            socket.EndConnect(ar);

            Trace.WriteLine("[ConnectCallback] exit");
        }
        catch (Exception e)
        {
            Trace.WriteLine(string.Format("[ConnectCallback] error {0}", e.ToString()));
        }
    }
}

After the application starts the inevitable crash will occur:

[CurrentDomain_UnhandledException] isTerminating=True error System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Net.Sockets.Socket.ConnectCallback()
   at System.Net.Sockets.Socket.RegisteredWaitCallback(Object state, Boolean timedOut)
   at System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(Object state, Boolean timedOut)
John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I face the same issue. I'm pretty confident this is a bug in the framework. The ConnectCallback function here http://referencesource.microsoft.com/#System/net/System/Net/Sockets/Socket.cs,7be8fddc24c74b66,references does not check 'asyncResult' is not null and that could be some race condition. Since you have a reproducing case, you should submit to connect. http://connect.microsoft.com/ – Simon Mourier Jan 31 '16 at 16:12
  • Possible duplicate of [What is a NullReferenceException and how do I fix it?](http://stackoverflow.com/questions/4660142/what-is-a-nullreferenceexception-and-how-do-i-fix-it) – Rob Feb 01 '16 at 05:42
  • @rob - certainly not. Please read carefully, this happens in .NET's own code (try the code). – Simon Mourier Feb 05 '16 at 07:24
  • I have issued a bug request to Microsoft here: https://connect.microsoft.com/VisualStudio/feedback/details/2324449/the-system-net-sockets-socket-class-can-throw-an-internal-uncatchable-nullreferenceexception – Simon Mourier Feb 09 '16 at 15:28

3 Answers3

1

The sample code you provided repeatedly calls BeginConnect without waiting for the async operation to complete.

Roughly, you're doing that

while(true)
{
    socket.BeginConnect(...);
    Sleep(1000);
}

So when your thread starts it first calls BeginConnect(), then wait one second, then call BeginConnect() again while the previous call is still executing.

On my computer, it gives me an InvalidOperationException, but I guess the exception type may depend on the CLR version (I'm using .NET 4.5.1).

Here are 3 different solutions:

  1. Cancel the async operation with Socket.EndConnect()
  2. Wait for the async operation to complete with IAsyncResult.AsyncWaitHandle.WaitOne()
  3. Don't use BeginConnect() and use Connect() instead
Benoit Blanchon
  • 13,364
  • 4
  • 73
  • 81
  • More important for me is to understand why the exception is not catched in try/catch blocks (in the code above all exceptions should be catched and traced), but immediatelly it is going to application domain unhandled exception (causing app total crash)? – Андрей Кунчев Oct 22 '13 at 18:54
  • Well, when I tested the exception was catched in the `[BeginConnect] error {0}` . Anyway, there is something seriously wrong in your code. You should try to fix it before seeking for bugs in the .NET framework. – Benoit Blanchon Oct 22 '13 at 20:06
1

I'm pretty confident this uncatchable error is caused by a bug in the Socket code and you should report it to connect.

Here is an extract from the Socket.cs code at .NET reference source: http://referencesource.microsoft.com/#System/net/System/Net/Sockets/Socket.cs,938ed6a18154d0fc

private void ConnectCallback()
{
  LazyAsyncResult asyncResult = (LazyAsyncResult) m_AcceptQueueOrConnectResult;

  // If we came here due to a ---- between BeginConnect and Dispose
  if (asyncResult.InternalPeekCompleted)
  {
     // etc.
      return;
  }
}

This callback is called by another static method:

private static void RegisteredWaitCallback(object state, bool timedOut)
{
  Socket me = (Socket)state;

  // Interlocked to avoid a race condition with DoBeginConnect
  if (Interlocked.Exchange(ref me.m_RegisteredWait, null) != null)
  {
    switch (me.m_BlockEventBits)
    {
    case AsyncEventBits.FdConnect:
      me.ConnectCallback();
      break;

    case AsyncEventBits.FdAccept:
      me.AcceptCallback(null);
      break;
    }
  }
}

This static method is never unregistered, it's always called, but it relies on a m_RegisteredWait event to determine if it must pass on to the socket member method.

The problem is I suppose this event is sometimes not null while the m_AcceptQueueOrConnectResult can be null, which causes the problem, in an uncatchable thread.

That being said, the root cause of the problem is the fact that your code exhibits problems in the first place as others have noted. To avoid this horrible uncatchable error, just make sure you call Close or Dispose on the socket when error happens and this will internally clear the m_RegisteredWait member. For example, the BeginConnect documentation says this:

To cancel a pending call to the BeginConnect method, close the Socket. When the Close method is called while an asynchronous operation is in progress, the callback provided to the BeginConnect method is called. A subsequent call to the EndConnect method will throw an ObjectDisposedException to indicate that the operation has been cancelled.

In your example, just add the following line to your callback code:

 private static void ConnectCallback(IAsyncResult ar)
    {
        try
        {
         ...
        }
        catch (Exception e)
        {
          if (_socket != null) _socket.Dispose();
        }
    }

Now, you'll still have errors but they will be normal errors.

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
0

If you look carefully at the stack trace, you'll see that the NullReferenceException occurs in System.Net.Sockets.Socket.ConnectCallback. If you look at your code, you'll see that you have a method named ConnectCallback.

That's what we call a "coincidence".

Please change the name of your callback method to MyConnectCallback, and change the BeginConnect call to:

var ar = socket.BeginConnect(host, 6500, new AsyncCallback(MyConnectCallback), socket);

See if that changes anything.

If I'm correct, and your ConnectCallback method is never called, then I'm also forced to wonder how your code works at all.

John Saunders
  • 160,644
  • 26
  • 247
  • 397