3

I'm trying to get my head around the usage of the WITH statement in sql azure. I have an initial query, which then needs to be filtered 2 times. If I just run the initial query, it runs fast. But as soon as I add the additional filters, the query runs so slow it never finishes and azure force closes the connection.

So, the first WITH statement, which on it's own runs very quick ->

CREATE TYPE BrokerAccountAndTimeType as TABLE
(
    [BrokerAccountId] [bigint],
    [TimeUTC] [datetime]
);
GO

CREATE PROCEDURE [dbo].[GetLatestOpenBrokerAccountTradesByBrokerAccountAndTime]
@UserId [int],
@BrokerAccountAndTime BrokerAccountAndTimeType READONLY
AS
BEGIN
WITH trades as (SELECT bats.*
                FROM BrokerAccountTrades bats
                where bats.[TradeTimeUTC] = (SELECT MAX([TradeTimeUTC]) 
                           FROM BrokerAccountTrades bats2
                           inner join @BrokerAccountAndTime bat
                           on (bats.UserId = @UserId and bats2.BrokerAccountId = bat.BrokerAccountId)
                           WHERE bats2.UserId = bats.UserId
                           AND bats2.BrokerAccountId = bats.BrokerAccountId
                           AND bats2.SecurityId = bats.SecurityId
                           AND bats2.[TradeTimeUTC] < bat.TimeUTC))

    select *
    from trades

END

This doesn't cause a problem - this query runs fast enough in a few seconds. However, if I then add the additional filtering, to get the records I actually want out of the 'trades' result, everything just starts going really slow. This seems counter-intuitive. If sql server simply ran the queries sequentially, there wouldn't be a problem and the result would come back really fast.

CREATE PROCEDURE [dbo].[GetLatestOpenBrokerAccountTradesByBrokerAccountAndTime]
@UserId [int],
@BrokerAccountAndTime BrokerAccountAndTimeType READONLY
AS
BEGIN

--initial query
WITH trades as (SELECT bats.*
                FROM BrokerAccountTrades bats
                where bats.[TradeTimeUTC] = (SELECT MAX([TradeTimeUTC]) 
                           FROM BrokerAccountTrades bats2
                           inner join @BrokerAccountAndTime bat
                           on (bats.UserId = @UserId and bats2.BrokerAccountId = bat.BrokerAccountId)
                           WHERE bats2.UserId = bats.UserId
                           AND bats2.BrokerAccountId = bats.BrokerAccountId
                           AND bats2.SecurityId = bats.SecurityId
                           AND bats2.[TradeTimeUTC] < bat.TimeUTC)),

    --filter the results from the 'trades' query    
    trades2 as (select t.*
    FROM trades t
    where t.ExternalId = (select max(ExternalId)
                        from trades t2
                        where t.userid = t2.userid
                        and t.brokeraccountid = t2.brokeraccountid
                        and t.securityid = t2.securityid))

    --filter the results from the 'trades2' query
    select t3.*
    from trades2 t3
    where t3.OpenClose = (select max(CONVERT(int,OpenClose))
                        from trades2 t4
                        where t4.userid = t3.userid
                        and t4.brokeraccountid = t3.brokeraccountid
                        and t4.securityid = t3.securityid)
    and t3.NewPosition <> 0

END

Does anyone know what the problem might be here? And is there a way to force the queries to be run sequentially? I could just return the first query to my code and filter out the unwanted rows in code, but that seems a really ugly fix.

And just for those trying to understand the query. This is just getting the latest records before a specified time, where a different time may be provided for each account that is being queried (thus the table valued parameter). The further filtering is required as certain records may share the same timestamp, so it is necessary to apply the further filtering to determine which is the 'latest' of those records sharing the same timestamp.

DaManJ
  • 373
  • 2
  • 15
  • 1
    There is no "order of execution" of WITH'ed (CTE) queries - note how they are prevented from running DML. In effect the SQL Engine just slaps them down below (it does *not* "build an intermediate result set" that is reused) - now, what does the *query plan* show is the problem? – user2864740 Oct 13 '14 at 05:41
  • Materialize your CTE that may be your solution, despite nay-sayers saying differently. HTH. – TT. Oct 13 '14 at 06:01
  • @TT i've tried the materialization hack, but it doesn't work on Azure... I guess it's the same reason as it doesn't work on 2012 either. http://stackoverflow.com/questions/13090037/t-sql-cte-materializing-techniques-not-working-on-sql-server-2012 – DaManJ Oct 13 '14 at 06:42
  • @DaManJ I'm not familiar with SQL Server Azure, but is it possible to use table variables (as you claim you cannot use temporary tables)? – TT. Oct 13 '14 at 06:57
  • @DaManJ I've come across [this thread](https://stackoverflow.com/questions/7021483/using-temp-tables-in-sql-azure) regarding temporary tables in SQL Azure. I don't see anything that would prohibit creation/use of temporary tables. Have no personal experience with Azure though, but I don't see anywhere that (local) temporary tables cannot be used. HTH. – TT. Oct 13 '14 at 07:10
  • @TT I was referring more to the usage of [SELECT into #temp from...] which is a way to create a temp table without having to define a schema http://feedback.azure.com/forums/217321-sql-database/suggestions/1410637-add-support-for-select-into-for-temp-tables-in – DaManJ Oct 13 '14 at 07:21
  • @DaManJ You can simply `CREATE TABLE #temp...` and then `INSERT INTO #temp...`, that is supported from what I gather from information I gathered on the internets. – TT. Oct 13 '14 at 07:22

1 Answers1

2

Ok, so i have 'solved' this by using a table variable which I find really distasteful. It means I have to re-define the schema of an existing table, in a stored procedure, which creates a maintenance burden of having to update this stored proc should I ever need to modify the table it is referencing such as adding new columns etc. YUK...

CREATE PROCEDURE [dbo].[GetLatestOpenBrokerAccountTradesByBrokerAccountAndTime]
@UserId [int],
@BrokerAccountAndTime BrokerAccountAndTimeType READONLY
AS
BEGIN

DECLARE @tempTrades TABLE
(
[Id] [bigint] NOT NULL PRIMARY KEY,
[UserId] [int] NOT NULL,
[BrokerAccountId] [bigint] NOT NULL,
[SecurityId] [tinyint] NOT NULL,
[TradeTimeUTC] [datetime] NOT NULL,
[OpenClose] [bit] NOT NULL,
--many more columns...
)

insert into @tempTrades
SELECT bats.*
                FROM BrokerAccountTrades bats
                where bats.[TradeTimeUTC] = (SELECT MAX([TradeTimeUTC]) 
                           FROM BrokerAccountTrades bats2
                           inner join @BrokerAccountAndTime bat
                           on (bats.UserId = @UserId and bats2.BrokerAccountId = bat.BrokerAccountId)
                           WHERE bats2.UserId = bats.UserId
                           AND bats2.BrokerAccountId = bats.BrokerAccountId
                           AND bats2.SecurityId = bats.SecurityId
                           AND bats2.[TradeTimeUTC] < bat.TimeUTC);

    --filter the results from the 'trades' query    
    WITH trades2 as (select t.*
    FROM @tempTrades t
    where t.ExternalId = (select max(ExternalId)
                        from @tempTrades t2
                        where t.userid = t2.userid
                        and t.brokeraccountid = t2.brokeraccountid
                        and t.securityid = t2.securityid))

    --filter the results from the 'trades2' query
    select t3.*
    from trades2 t3
    where t3.OpenClose = (select max(CONVERT(int,OpenClose))
                        from trades2 t4
                        where t4.userid = t3.userid
                        and t4.brokeraccountid = t3.brokeraccountid
                        and t4.securityid = t3.securityid)
    and t3.NewPosition <> 0

END
DaManJ
  • 373
  • 2
  • 15
  • the day you add a column to your table BrokerAccountTrades, this procedure will fail. – t-clausen.dk Oct 13 '14 at 07:18
  • You can use temporary tables as I outlined in comments on your question. – TT. Oct 13 '14 at 07:23
  • @t-clausen.dk which is why i hate this and said 'YUK'. see http://feedback.azure.com/forums/217321-sql-database/suggestions/1410637-add-support-for-select-into-for-temp-tables-in – DaManJ Oct 13 '14 at 07:23
  • 1
    Always use explicit columns in your SELECT statement instead of *, it will not break then. – TT. Oct 13 '14 at 07:25
  • Also, temporary tables especially if you have many rows will outperform table variables. SQL Server does not maitain statistics on table variables, it does for temporary tables. If the number of rows you insert into the table variable increases at some point, you may experience degraded performance again. But as long as you got that WITH out of the equation, you are already better off... – TT. Oct 13 '14 at 07:29
  • @TT it is better that it breaks than the business layer silently not get the new columns if i would update the table and forget to update the stored proc. Also just on table valued parameter vs temp-table, it offers better performance. – DaManJ Oct 13 '14 at 07:31
  • @DaManJ That is up to your preference of course. The question is already posted to Microsoft to support the `SELECT ... INTO #tt`, you can only hope they support it in some next Service Pack. In any case, I never do a `SELECT * ...` exactly for the reason that it may break functionality if you unknowingly extend the table with extra columns. Only in specific cases do I use `SELECT *...`, eg in an `EXISTS/NOT EXISTS` construct. GL! – TT. Oct 13 '14 at 07:34
  • @TT well I would have thought it is better that it breaks in DEV environment and it can be fixed there rather then it silently get deployed to production and not return the values of the new columns and have some unknown affect in the business layer. But anyway thanks for your help your links pointed me to more info on people having the same problem with CTE and no 'nice' solutions. Thanks microsoft for making this so hard. – DaManJ Oct 13 '14 at 07:41
  • @DaManJ Odds are your product is so big, and your testing process not perfect. In that case you want your installs to run smooth and not break, even if that means a column or two aren't visible yet. I'd rather not have things break when installed at clients. But if your testing process is very good then your way would be better. In any case, `SELECT *` is a no-no if you search around for some guidelines on how to write proper SQL you'll see. – TT. Oct 13 '14 at 08:24