10

In our line of business we are hosting a REST based API that is hosted by Windows Azure and with SQL Azure as database storage.

Both the Web Role (Windows 2008R2, IIS 7.5, WCF, Large instance) and SQL Azure is hosted in North Europe region.

The problem is, that when we do intensive SQL work we often get a "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.".

What troubles me here is, that no matter what we do, we cannot provoke this on our on-premise SQL servers (SQL Server 2008R2).

Any help in clarifying this mystery is appreciated as it seems that the Web Role instance is not directly talking to the SQL Azure instance although both are located in North Europe.

A more detailed exception:

<SqlException>
    <Message>Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.</Message>
    <StackTrace>
        <Line>at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)</Line>
        <Line>at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()</Line>
        <Line>at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)</Line>
        <Line>at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()</Line>
        <Line>at System.Data.SqlClient.SqlDataReader.get_MetaData()</Line>
        <Line>at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)</Line>
        <Line>at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)</Line>
        <Line>at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)</Line>
        <Line>at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)</Line>
        <Line>at System.Data.SqlClient.SqlCommand.ExecuteScalar()</Line>
        <Line>at SyncInvokeAddCollaboratorFieldInstance(Object , Object[] , Object[] )</Line>
        <Line>at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]&amp; outputs)</Line>
        <Line>at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc&amp; rpc)</Line>
        <Line>at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc&amp; rpc)</Line>
        <Line>at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage31(MessageRpc&amp; rpc)</Line>
        <Line>at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)</Line>
    </StackTrace>
    <UserDefinedInformation>
        <HelpLink.ProdName><![CDATA[Microsoft SQL Server]]></HelpLink.ProdName>
        <HelpLink.ProdVer><![CDATA[11.00.2065]]></HelpLink.ProdVer>
        <HelpLink.EvtSrc><![CDATA[MSSQLServer]]></HelpLink.EvtSrc>
        <HelpLink.EvtID><![CDATA[-2]]></HelpLink.EvtID>
        <HelpLink.BaseHelpUrl><![CDATA[http://go.microsoft.com/fwlink]]></HelpLink.BaseHelpUrl>
        <HelpLink.LinkId><![CDATA[20476]]></HelpLink.LinkId>
    </UserDefinedInformation>
</SqlException>
gimlichael
  • 1,365
  • 1
  • 15
  • 28
  • Identical indexes in the two databases? Yes, Azure SQL is most likely slower for simple queries due to latency, but 8-15 times sounds pretty steep for the same database schema. – Joachim Isaksson Sep 25 '12 at 13:40
  • Yep, "identical" schemas where SSMS 2012 generate the script for SQL Azure. "Identical" because the generated script is not 1-1 with SQL Server 2008R2. I can - to some point - understand the latency, but should this not be "eliminated" when both the Web and SQL is in same region? – gimlichael Sep 25 '12 at 13:56
  • And you are right; the 8-15 was exaggerated .. it's more 4-8 times slower (different scenarios; for this it's "only" 4-5 times slower), but with missing records do to timeout. – gimlichael Sep 25 '12 at 14:02
  • 1
    Regarding the latency, I suspect a lot of the latency isn't in the network but that in that the database servers are shared, so if someone else is using a lot of memory, you may not get all the cache memory you'd get from a "dedicated" server. There's an answer [here](http://stackoverflow.com/a/11575030/477878) that may be of interest, it may explain some of the issues but not really help with a solution. – Joachim Isaksson Sep 25 '12 at 14:11
  • 1
    [This](http://blogs.msdn.com/b/sqlazure/archive/2010/05/27/10016392.aspx) may help if you can install Management Studio on the Azure hosted machine. It should give more reliable results than a plain network measurement. – Joachim Isaksson Sep 25 '12 at 14:20

1 Answers1

6

If you need to do SQL intensive work (for instance, a lot of INSERT statements in a normalized OLTP database) you need to have fail-over logic in your code.

On-premise SQL server will not suffer from this, so take this into consideration before switching to SQL Azure.

These two articles inspired me (special thanks to Joachim Isaksson for guidance):

http://blogs.msdn.com/b/sqlazure/archive/2010/05/11/10011247.aspx

http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/7a50985d-92c2-472f-9464-a6591efec4b3/

To sum of the result, I have provided the two results which is now identical in result (where before some records where not added do to missing fail-over logic in regards to the original question: Timeout Expired):

On-premise SQL Server: 179.285 records in 427 seconds

SQL Azure w. fail-over logic: 179.285 records in 2.247 seconds - a whooping 5.2 times slower!

I hope this can help others struggling with SQL Azure. On a positive note; you learn (the hard way) that you have been lucky and privileged on your native in-house applications :-)

Note: i would still like an explanation how this can happen .. seems to easy to blame on latency, no?

gimlichael
  • 1,365
  • 1
  • 15
  • 28