11

I got a Windows service listening to inserts in a table using the SqlDependency class.

It works fine for several days but then suddenly stops working.

In the normal scenario, I receive change events

e.Type = SqlNotificationType.Change
e.Info = SqlNotificationInfo.Insert
e.Source = SqlNotificationSource.Data

If nothing changes, I get timeout events every 3600 seconds

e.Type = SqlNotificationType.Change
e.Info = SqlNotificationInfo.Error
e.Source = SqlNotificationSource.Timeout

or (don't know why there are two different timeout events)

e.Type = SqlNotificationType.Change
e.Info = SqlNotificationInfo.Unknown
e.Source = SqlNotificationSource.Timeout

This can work for a week or more, but then suddenly, I don't receive change events anymore and instead receive an event every 60 seconds with

e.Type = SqlNotificationType.Change
e.Info = SqlNotificationInfo.Error
e.Source = SqlNotificationSource.Client

The MSDN documentation for SqlNotificationSource.Client says

A client-initiated notification occurred, such as a client-side time-out or as a result of attempting to add a command to a dependency that has already fired.

I think this means that a timeout occurred when creating the dependency.

The same code is running all the time and looks like this:

private void CreateDependency() {
    using (var connection = new SqlConnection(_connectionString)) {
        connection.Open();

        var command = new SqlCommand();
        command.CommandText = "SELECT ...";
        command.Connection = connection;

        new SqlDependency(command, "ServiceName", DependencyTimeout).OnChange += OnChange;

        command.ExecuteNonQuery();
    }
}

private void OnChange(object sender, SqlNotificationEventArgs e) {
    ((SqlDependency)sender).OnChange -= OnChange;

    if (e.Type == SqlNotificationType.Change && e.Info == SqlNotificationInfo.Insert) {
        _changeWorkerNotifier.Set(); // AutoResetEvent
    }

    CreateDependency();
}

If I restart my service, it works fine again.

I did some investigation and found out that the errors seem to start after a scheduled backup on the server (which happens every day). Got another application where I around the same time get errors like

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)

My original guess was that SqlDependency.Start() creates a connection to the server which faults during the scheduled backup and never recovers. But the first line in the MSDN documentation says

The SqlDependency listener will restart when an error occurs in the SQL Server connection.

Any ideas on how to solve this?
(I can of course let the service fail and have the service manager restart it. The problem is that the service does other things as well which need to close down properly so I can't just do Environment.Exit(-1) from the event handler.)

Pang
  • 9,564
  • 146
  • 81
  • 122
adrianm
  • 14,468
  • 5
  • 55
  • 102
  • Try to reproduce it by artificially creating timeouts (suspend sqlservr.exe) or network errors (kill it). – usr Mar 11 '14 at 08:32
  • I have no idea, how `SqlDependency` works, but if it works over some `SqlConnection`, than it's enough to break underlying network connection to get `SqlConnection` in unrecoverable error. I'm surprised, that your code works so long - looks like you have very stable network environment. – Dennis Mar 11 '14 at 08:39

1 Answers1

12

Found the cause and a solution.

First, I found out that the line

The SqlDependency listener will restart when an error occurs in the SQL Server connection.

is only present in the .Net 4 documentation.

Some tests show that it is not just a change in the documentation! The Client/Error event does not appear when running with CLR4.

So the cause is a connection error which is handled inside SqlDependency in .Net 4 but not in earlier versions.

In .Net 2-3.5, it is possible to recover after the error with a SqlDependency.Stop() / SqlDependency.Start().

Don't really like the Stop/Start solution because I then need logic to break the loop if the connection error is not recoverable. I decided to just stop the service in case of error and let the service manager restart it (which makes the problem visible in the event log etc).

My handler now looks like this:

private void OnChange(object sender, SqlNotificationEventArgs e) {
    ((SqlDependency)sender).OnChange -= OnChange;

    if (e.Source == SqlNotificationSource.Timeout) {
        // just restart notification
    }
    else if (e.Source != SqlNotificationSource.Data) {
        Logger.Error("Unhandled change notification {0}/{1} ({2})", e.Type, e.Info, e.Source);
        ServiceRunner.ShutDown(true);
    }
    else if (e.Type == SqlNotificationType.Change && e.Info == SqlNotificationInfo.Insert) {
        _changeWorkerNotifier.Set(); // AutoResetEvent
    }
    else {
        Logger.Log("Ignored change notification {0}/{1} ({2})", e.Type, e.Info, e.Source);
    }

    CreateDependency();
}

** EDIT **

Here is the code I call on startup

private void ClearOldSubscriptions() {
    using (var connection = new SqlConnection(_connectionString))
    using (var command = new SqlCommand()) {
        string sql =
            ////@"DECLARE @UniqueTimeout AS int = 3586; " +
            @"DECLARE @SubscriptionId AS int; " +
            @"DECLARE @Sql AS varchar(max); " +
            @"DECLARE SubscriptionCursor CURSOR LOCAL FAST_FORWARD " +
            @"    FOR " +
            @"        SELECT id " +
            @"        FROM sys.dm_qn_subscriptions " +
            @"      WHERE database_id = DB_ID() " +
            @"            AND timeout = @UniqueTimeout " +
            @"OPEN SubscriptionCursor; " +
            @"FETCH NEXT FROM SubscriptionCursor INTO @SubscriptionId; " +
            @"WHILE @@FETCH_STATUS = 0 " +
            @"BEGIN " +
            @"    SET @Sql = 'KILL QUERY NOTIFICATION SUBSCRIPTION ' + CONVERT(varchar, @SubscriptionId); " +
            @"    EXEC(@Sql); " +
            @" " +
            @"    FETCH NEXT FROM SubscriptionCursor INTO @SubscriptionId; " +
            @"END";

        command.Connection = connection;
        command.CommandType = CommandType.Text;
        command.CommandText = sql;
        command.Parameters.Add("@UniqueTimeout", SqlDbType.Int).Value = DependencyTimeout;

        connection.Open();

        command.ExecuteNonQuery();
    }
}

private void ClearNotificationQueue() {
    using (var connection = new SqlConnection(_connectionString))
    using (var command = new SqlCommand()) {
        string sql = 
            @"DECLARE @Conversation AS uniqueidentifier; " +
            @"DECLARE ConversationCursor CURSOR LOCAL FAST_FORWARD  " +
            @"    FOR " +
            @"        SELECT conversation_handle  " +
            @"        FROM {@Queue} " +
            @"     " +
            @"OPEN ConversationCursor; " +
            @"FETCH NEXT FROM ConversationCursor INTO @Conversation; " +
            @"WHILE @@FETCH_STATUS = 0  " +
            @"BEGIN " +
            @"    END CONVERSATION @Conversation WITH CLEANUP; " +
            @" " +
            @"    FETCH NEXT FROM ConversationCursor INTO @Conversation; " +
            @"END " +
            @"";
        sql = sql.Replace("{@Queue}", NotificationQueue);

        command.Connection = connection;
        command.CommandType = CommandType.Text;
        command.CommandText = sql;

        connection.Open();

        command.ExecuteNonQuery();
    }
}
Pang
  • 9,564
  • 146
  • 81
  • 122
adrianm
  • 14,468
  • 5
  • 55
  • 102
  • 1
    This is great stuff - thanks! You should mark as the solution too ;) – tom redfern Sep 11 '14 at 13:00
  • For example, what does _changeWorkerNotifier.Set(); do? – tom redfern Sep 11 '14 at 13:11
  • 1
    If you don't mind posting your code I'd be grateful ;) – tom redfern Sep 11 '14 at 13:27
  • @TomRedfern, what code do you want me to post? `_changeWorkerNotifier.Set()` notifies the worker thread there has been an insert in the monitored table. – adrianm Sep 12 '14 at 06:18
  • FYI I'm getting similar behavior - see https://connect.microsoft.com/SQLServer/feedback/details/543921/sqldependency-incorrect-behaviour-after-sql-server-restarts - but I'm using .net 4. I am just wondering if there's anything I can do in my code to mitigate it – tom redfern Sep 12 '14 at 07:43
  • I use the above solution in .Net 4 as well (i.e restart my application on all error events). One thing to watch out for is orphaned dependencies in sqlserver. They will eventually fire or timeout and cause an error event. I added code to delete all orphaned dependencies on program startup and it has worked fine since then. (AFAIK there is no way to find out which dependencies are orphaned. I just use a magic timeout value, e.g. 3578s, on my dependencies and delete all with that timeout value) – adrianm Sep 14 '14 at 05:26
  • @TomRedfern, and if you don't want to restart your application on errors I assume Stop(), delete orphaned, Start() would work as well. I have not tried it. – adrianm Sep 14 '14 at 05:43
  • @TomRedfern, regarding your connect issue. Maybe you got the same problem as I did. When a change was detected an orphaned dependency triggered first and caused an error event. I restarted on the error event and the "active" dependency became orphaned as well and I the real event became an error. It would be so much easier if the error event included information on if it comes from the active dependency or is a warning "your dependency is fine, i just received a message from an unknown dependency". – adrianm Sep 14 '14 at 06:05
  • not clear this line if (e.Source == SqlNotificationSource.Timeout) { // just restart notification } restart notification means ?? – Thomas Oct 06 '14 at 17:28
  • what this line does else if (e.Source != SqlNotificationSource.Data) { ServiceRunner.ShutDown(true);} this line ServiceRunner.ShutDown trying to restart the service from within the service code ?? – Thomas Oct 06 '14 at 17:29
  • @Thomas, Restart means `new SqlDependency(...).OnChange += OnChange;` which happens in the `CreateDependency()` method – adrianm Oct 07 '14 at 06:16
  • @Thomas, `ServiceRunner.Shutdown()` just stops the service. I have set up recovery in the service settings to have the service manager start the service again. – adrianm Oct 07 '14 at 06:20
  • How to setup recovery in service setting to have the service manager start it again. Share the idea or redirect me to any detail article for this purpose. – Thomas Oct 07 '14 at 13:17
  • @Thomas, Your service/right click/properties/recovery tab. just remember to have your service return an error (i.e. `Environment.Exit(1)`) to trigger the recovery. – adrianm Oct 07 '14 at 14:17
  • @Adrian : you said to use Environment.Exit(1) but u did not use it in ur code rather u use ServiceRunner.Shutdown() when error occur. do u want me to write this line (e.Source != SqlNotificationSource.Data) { ServiceRunner.ShutDown(true);Environment.Exit(1);} or (e.Source != SqlNotificationSource.Data) { Environment.Exit(1);} – Thomas Oct 07 '14 at 19:03
  • what is the meaning of these line (e.Source != SqlNotificationSource.Data) when this condition will occur ? – Thomas Oct 07 '14 at 19:04
  • one Ernest request that please see my thread http://stackoverflow.com/questions/26211049/issue-regarding-sqldependency-used-in-windows-service look my overall code and specially edit section and tell me does it work properly or do i need to plug more code to run it smoothly. if possible give me rectified version of my code there in my thread. thanks – Thomas Oct 07 '14 at 19:05
  • @Thomas, The SqlDependency documentation is not very specific on what events occur when. When I get a notification which is not `SqlNotificationSource.Data` I just assume something is wrong and restart the service. – adrianm Oct 08 '14 at 05:57
  • i search google and come to know that we can restart service from with in service. just tell me how you are restring service when error occur?. give me the code which could restart the service when error occur in service. thanks – Thomas Oct 08 '14 at 07:34
  • what is ServiceRunner? ServiceRunner.ShutDown(). ServiceRunner is your custom code or buil-in class? – Thomas Oct 08 '14 at 07:36
  • @Thomas the shutdown code is what causes the host process to recycle. In my case it was a windows service so I was able to use Environment.Exit(1); and set the windows service recovery options to restart on failure. – tom redfern Oct 29 '14 at 16:57
  • @adrianm : would u mind to say when u r calling these function ClearOldSubscriptions() , ClearNotificationQueue() ? i am working windows service with sql dependency which will be in running state for 24X7. so i just like to know can i copy your above 2 functions ClearOldSubscriptions() , ClearNotificationQueue() to paste in my code. is there will be any problem if i call these function ClearOldSubscriptions() and ClearNotificationQueue() from OnChange() event handler just to clear all old notification ? please guide me coz i am facing problem regarding sql dependency. – Mou Apr 27 '15 at 13:14
  • I call them on service startup only but I restart my service everytime I receive an unknown/error event since I don't know how to handle them. If you don't restart your service I assume you should call them before `SqlDependency.Start`. – adrianm Apr 28 '15 at 07:07
  • thats some good investigation @adrianm. l am facing the similar issue, looking at your cleanup code, it seems you are creating your own queues/service and giving timeout to subscriptions and not using default queues, subcription that sql server creates when you setup a dependency. Is this correct? – Nitin Nov 25 '15 at 12:25
  • @Nitin, I created my own queues and service. Didn't know sql server could create them for you. I have moved away from notifications in new projects (they still work fine in the old projects). I now have a model where I use my own triggers to put change messages into a queue. It is a lot more flexible on what changes to report and I can also include relevant data in the message. – adrianm Nov 25 '15 at 14:38
  • @adrianm yes it creates for you if you create SqlDependency whithout the last two parameters. It saves from maintaining the queues and services and handle their life cycle for you. But as you rightly said it does not give you flexibility. This is some good investigation you did here . +1 from my side. – Nitin Nov 26 '15 at 06:16