35

At a high level, here is what is happening:

  1. We have two SQL Server 2008 R2 SP1 systems (Standard Edition on Windows NT 6.1 (Build 7601: Service Pack 1)) They are humming along just fine, communicating bi-directionally with no errors or issues.
  2. We reboot system #2, expecting that any Service Broker messages sent to it while it is unavailable will queue up on system #1, until system #2 comes back up.
  3. System #2 comes back up and everything there starts normally with no errors.
  4. The messages that queued up on system #1 for system #2 remain queued up; they are never sent. Furthermore, new messages on that conversation also queue up and are never sent.
  5. Messages sent on new conversations are transmitted just fine.

Details about the messages that are never sent:

A. While system #2 is down, the transmission_status for the messages in the queue show various errors indicating that it cannot communicate with system #2, as expected.

B. Shortly after system #2 comes back up, the transmissions_status for those messages goes blank. The blank status never changes after this point.

C. The conversation where messages stack up is in the CONVERSING/CO state. No columns in the system view indicate anything is any different from other queues that are working fine. (If I could find any flags set differently, I would know to terminate the bad conversation, but the system offers no clues--other than the ever-growing queue depth.)

D. The messages are never received on system #2, in the sense that my activation stored procedure is never called for these messages.

E. In Profiler (with all Broker trace types turned on), a good conversation shows these things being logged:

Broker:Conversation CONVERSING  1 - SEND Message        Initiator                                       
Broker:Message Classify 2 - Remote  Initiator
[SQL Batch complete; SQL that caused the SEND to occur]
Broker:Remote Message Acknowledgement   1 - Message with Acknowledgement Sent   Initiator
Broker:Message Classify     1 - Local   Initiator
Broker:Conversation CONVERSING  6 - Received Sequenced Message  Target
Broker:Remote Message Acknowledgement   3 - Message with Acknowledgement Received       Initiator
Broker:Activation       Microsoft SQL Server Service Broker Activation  1 - Start

A message being sent which is destined to get stuck shows only the first two of those events:

Broker:Conversation CONVERSING  1 - SEND Message    Initiator
Broker:Message Classify 2 - Remote  Initiator

As far as I can tell, this is all the farther those messages get. There is no indication that SQL Server tries to transmit them ever again. System #1 thinks the conversation is still good, but System #2 has forgotten it completely. System #1 never seems to figure this out. If we subsequently reboot system #1, then everything is back to normal with all messags flowing as intended.

I have considered that these messages have actually been sent, but that the acknowledgement is not making it back to system #1. But I don’t see any evidence of backed up queues of acknowledgements.

We have checked for numerous typical issues on both sides:

Broker is enabled on both sides. 2. All queues are on, with all appropriate things enabled (enqueue, receive). Queues are not poisoned. 3. No permissions issues exist that we know of. 4. We are not using fire-and-forget. 5. We are reusing conversations, as various people recommend doing. (In fact, conversation re-use is the problem here!) 6. We are trapping SQL exceptions, using transactions as instructed, etc. 7. ssbdiagnose returns no errors.

When a SQL Server host is rebooted, we expect that any queued up messages will eventually get sent, but they are not. What is going on here??

user1745937
  • 461
  • 4
  • 6
  • 1
    Can you attach Profiler on Target machine to see what is happening after reboot? Errors should be raised on its side. – Dalex Nov 20 '12 at 08:00
  • You should also try listening to "Security Audit --> Audit Broker Conversation" and "Security Audit --> Audit Broker Login" events. Please ensure that you do it on both sides. Also, it is interesting that SSBDiagnose.exe didn't detect any SSB configuration issues. – Nabheet Dec 13 '12 at 22:40
  • 1
    I've seen multiple issues with the broker when it's rebooted or not working. Finally I decided to use an ordinary queue, insert a record in the queue and use the message broker to send the queue item id. Then I also wrote a periodic queue handler. So in 99% of the cases everything works just fine with the message broker, in the 1% I miss messages the (time based) queue handler picks it up. – Paul Jan 02 '14 at 09:02
  • Do you have an activation stored procedure to process messages ? – Rom Eh May 27 '14 at 11:11
  • I think you would have better luck posting/migrating this to SE Database Administrators page. http://dba.stackexchange.com/ – Phrancis May 29 '14 at 16:35

1 Answers1

3

I understand this is a quite old thread, but I have combated exactly the same situation before, and in my case the network configuration was the culprit.

For some reason, the initiator has sent its messages from one IP address, but another IP has been opened to accept incoming replies (and this second IP has been specified in target's route).

I have detected this by accident, really. When I tried to end conversation on the target side, it hasn't closed, but the EndDialog message appeared in sys.transmission_queue with the status:

Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.

I have no idea why the target restart has triggered the breakdown, but when network engineers have fixed the issue and I changed the target's route, everything flew to their destinations as it was supposed from the start.

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33
  • FYI this very error message will be logged to profile as [Broker:Connection Event Class](http://technet.microsoft.com/en-us/library/ms190760(v=sql.110).aspx). [`ssbdiagnose.exe`](http://msdn.microsoft.com/en-us/library/bb934450.aspx) `RUNTIME` should also capture this event and report it, along with possibly more diagnostics. – Remus Rusanu Aug 12 '14 at 11:33
  • @RemusRusanu - shame to me, I didn't know about this tool back then. And we also tried to implement reusable dialogs, so closing them on target side didn't look like an obvious thing to do. – Roger Wolf Aug 12 '14 at 13:16