4

I'm using SignalR version 2.1.2 with SignalR.Redis 2.1.2 on Server 2012 R2, IIS 8.5 with WebSockets enabled.

All is running perfectly in my development environment. I can even stand up copies on different servers (e.g. http machine1/myapp/signalr, http machine2/myapp/signalr) of the site configured to use the same backplane, and both UI's get messages pubb'd to them perfectly.

I then moved "myapp" to our next environment, which is a cluster of 2 machines sitting behind an F5 load balancer, with a dns alias setup to route to the F5, and then round robin "myapp". The website itself can connect to signalr just fine, and can receive published messages it subscribes to, BUT when I try to publish to the site via the alias (e.g. http myappalias/signalr), I get a 400, Bad Request error response. Here is an example of the error.

  InnerException: Microsoft.AspNet.SignalR.Client.Infrastructure.StartException
       _HResult=-2146233088
       _message=Error during start request. Stopping the connection.
       HResult=-2146233088
       IsTransient=false
       Message=Error during start request. Stopping the connection.
       InnerException: System.AggregateException
            _HResult=-2146233088
            _message=One or more errors occurred.
            HResult=-2146233088
            IsTransient=false
            Message=One or more errors occurred.
            InnerException: Microsoft.AspNet.SignalR.Client.HttpClientException
                 _HResult=-2146233088
                 _message=StatusCode: 400, ReasonPhrase: 'Bad Request', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
  Pragma: no-cache
  Transfer-Encoding: chunked
  X-Content-Type-Options: nosniff
  Persistent-Auth: true
  Cache-Control: no-cache
  Date: Thu, 13 Nov 2014 22:30:22 GMT
  Server: Microsoft-IIS/8.5
  X-AspNet-Version: 4.0.30319
  X-Powered-By: ASP.NET
  Content-Type: text/html
  Expires: -1
}

Here is some test code I'm using to publish test messages to each environment, where it fails on "connection.Start().Wait()"

class Program
{
    static void Main(string[] args)
    {
        var connection = new HubConnection("http://myappalias/signalr");

        connection.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;

        var proxy = connection.CreateHubProxy("MyAppHub");

        connection.Start().Wait();

        ConsoleKeyInfo key = Console.ReadKey();

        do
        {


            proxy.Invoke("NewMessage", new Message() { Payload = "Hello" });

            Console.WriteLine("Message fired.");

            key = Console.ReadKey();

        } while (key.Key != ConsoleKey.Escape);
    }
}

Now, if I don't use the "myappalias", and instead hit the server head on, it works perfectly. It appears either the F5 is the problem, the client needs to be configured differently for this scenario or I have to do something different when setting up signlar's startup class. Here is an example of the startup class I'm using.

[assembly: OwinStartup(typeof(MyApp.Startup))]
namespace MyApp
{
    public class Startup
    {
        private static readonly ILog log = LogManager.GetLogger
        (System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);

        public void Configuration(IAppBuilder app)
        {
            try
            {
                log.Debug(LoggingConstants.Begin);

                string redisServer = ConfigurationManager.AppSettings["redis:server"];

                int redisPort = Convert.ToInt32(ConfigurationManager.AppSettings["redis:port"]);

                HubConfiguration configuration = new HubConfiguration();
                configuration.EnableDetailedErrors = true;
                configuration.EnableJavaScriptProxies = false;
                configuration.Resolver = GlobalHost.DependencyResolver.UseRedis(redisServer, redisPort, string.Empty, "MyApp");

                app.MapSignalR("/signalr", configuration);   

                log.Info("SIGNALR - Startup Complete");
            }
            finally
            {
                log.Debug(LoggingConstants.End);
            }
        }

    }

}

I download the client source code, and wired that in directly instead of the nuget package, so I could step through everything. I seems it successfully negotiates, and then attempt to "connect" with SSE's and then LongPolling transports, but fails at both.

Question 1.1

Anyone know of an alternative to Signalr for .NET that supports scaling with load balancing in a less "I want to pull my hair out" kind of way?

wakurth
  • 1,644
  • 1
  • 23
  • 39

2 Answers2

1

The problem was fixed by switching the profile for "MyApp" in the F5, to using the "source_addr" profile built into the F5 as a parent profile with a timeout of 1 hour. Here is a description of what that profile does:

Source address affinity persistence Also known as simple persistence, source address affinity persistence supports TCP and UDP protocols, and directs session requests to the same server based solely on the source IP address of a packet.

EDIT

This ended up "Working" for a while, but if I deploy a publisher (something that simply publishes through the signalr client) without republishing the Hub, the publisher times out trying to connect over and over and over again. uhg.

wakurth
  • 1,644
  • 1
  • 23
  • 39
  • ya, I experienced something similar to this problem last year by just implementing Async get requests on regular ASP.NET pages. The problem was the F5 device would start a session and the user would click on an Async post back, the server that got the request wasn't the one that served up that page in the first place, so.... it didn't work. The word at the time on the street was "we don't support sticky sessions"... But get this, rather than fix it, they rolled back all the Async support in the applications! Go Forward 4 Go Back 6.... – JWP Nov 14 '14 at 19:51
  • @user1522548 - yeah, I hear ya. – wakurth Nov 14 '14 at 23:29
1

It should not be necessary to configure source address affinity to use SignalR behind a load balancer. It's certainly not wrong to set up session affinity, but that doesn't fix your underlying problem.

If you look closely at the content of the 400 response, you probably see a message similar to "The ConnectionId is in the incorrect format."

SignalR uses the server's machine key to create an anti-CSRF token, but this requires that all the servers in your farm share a machine key for the token to be properly decrypted in when SignalR requests hop servers. The /negotiate request that you see succeed is the request that retrieves the anti-CSRF token. When the SignalR client then uses the anti-CSRF token to make a /connect request, it failed because the /connect request was processed by a different server that didn't create the token and is unable to decrypt it.

This explains why setting up session affinity fixed your problem, but sharing a machine key will help you avoid this problem even if something goes wrong with session affinity.

Here is an issue that filed on GitHub by someone who experienced a similar issue: https://github.com/SignalR/SignalR/issues/2292.

halter73
  • 15,059
  • 3
  • 49
  • 60
  • Interesting, thanks for the link and explanation. I did not see "Then ConnectionId is in the incorrect format" anywhere in the response or it's nested responses, but it certainly makes sense that your example could be what was actually happening. – wakurth Nov 14 '14 at 23:26
  • It can be a little tricky to read the response body when inspecting an HttpClientException/WebException since the response body is still a stream. The easiest way to get at it is probably by accessing `ex.GetError().ResponseBody` where [GetError()](http://msdn.microsoft.com/en-us/library/microsoft.aspnet.signalr.client.errorextensions.geterror(v=vs.118).aspx) is an extension method for Exception defined by SignalR in the Microsoft.AspNet.SignalR.Client namespace. – halter73 Nov 15 '14 at 03:42