4

I have a subquery (LastActivityOn) that I'd like to use in three places, my projection (SELECTed output), ORDER BY, and WHERE clause.

SELECT TOP 175
  (SELECT MAX(ActivityDate) FROM (VALUES
    (UserRegistration.CreatedOn),
    (UserRegistration.ActivatedOn),
    (UserRegistration.LastLoginOn),
    (UserRegistration.UpdatedOn),
    (UserProfile.LastPostedOn)) AS AllDates(ActivityDate)) LastActivityOn,
  UserRegistration.FirstName,
  UserRegistration.LastName,
  [15 more columns of various calculated distances, coalesces, etc...]
FROM
  UserRegistration
  INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId
  INNER JOIN (
    SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince
    FROM PostalCodes 
    WHERE @OriginPostalCode IS NULL OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters
  ) AS ProximalPostalCodes ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode
  [7 more joins including full-text queries]
WHERE
  LastActivityOn > @OldestUserToSearch AND
  [20 more lines of filtering logic]
ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2),
  FullTextRelevance

Note the three occurrences of LastActivityOn. Also note that the LastActivityOn subquery references two tables. I suppose because it depends on the join clause in the parent query, it is inherently a correlated subquery?

When I was only taking the maximum of two dates via a User-Defined-Function, I was able to use the resulting value in my WHERE and ORDER BY. Now I cannot.

It seems like I have a few options... I could wrap the whole thing in another query, repeating the projection with just the added activity. It seems like I may be able to use "WITH" (a CTE) in the same way.

But because I don't understand clearly the rules of when I can and cannot use a subquery the way I want to, I could easily be missing something. Any ideas?

Or maybe SQL SERVER will be smart enough to only perform the calculation once for each output row, and I shouldn't worry about it?

EDIT: Currently running SQL Server 2008 Standard, but an upgrade will be in-order at some point. Also, RE: the log function - I'm working to combine with with relevance as a weighted total, so that's a work-in-progress. I'll either trim it with INT to use as a type of ranking, or add it to relevance with a linear adjustment.

CORRECTION: I was able to use the subquery alias in my ORDER BY, but not with any additional calculations or in the where clause. Thanks to ypercube for pointing that out.

shannon
  • 8,664
  • 5
  • 44
  • 74

3 Answers3

6

I don't try to modify your query, but may be common table expression is what you need.

ceth
  • 44,198
  • 62
  • 180
  • 289
  • 1
    I don't see how a CTE ("WITH" clause) is different than a subquery in the FROM clause. With the CTE, I'd wrap my existing query in a WITH clause, but move the LastActivityOn WHERE and ORDER BY to the outer select, and repeat the projection there, correct? So that would be functionally the same as moving the entire query into a subselect, right? Let me know if I missed something please. – shannon Aug 06 '12 at 10:56
  • It looks like OUTER APPLY may do what I need with less code, I'm evaluating the performance now and will post the results. – shannon Aug 06 '12 at 13:35
  • After quite a lot of work on this problem, a CTE didn't have any functional difference from a subquery in this case. Semantically it looked slightly different, in that it required a few extra keywords in addition to the duplicated projection. Thanks for the suggestion though. – shannon Aug 07 '12 at 08:34
2

You cannot use the LastActivityOn alias in the WHERE clause but you can use it in the ORDER BY.

If you want to not repeat the code in 2 places (SELECT and WHERE), you can either use a CTE or select this LastActivityOn result - and the whole subquery - in a derived table and then use it in the external level:

SELECT TOP 175
  LastActivityOn,
  FirstName,
  LastName,
  ...
FROM
    ( SELECT
        ( SELECT MAX(ActivityDate) 
          FROM 
            ( VALUES
                (UserRegistration.CreatedOn),
                (UserRegistration.ActivatedOn),
                (UserRegistration.LastLoginOn),
                (UserRegistration.UpdatedOn),
                (UserProfile.LastPostedOn)
            ) AS AllDates(ActivityDate)
        ) LastActivityOn,
        UserRegistration.FirstName,
        UserRegistration.LastName,
        [15 more columns of various calculated distances, coalesces, etc...]
      FROM
        UserRegistration
        INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId
        INNER JOIN (
          SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince
          FROM PostalCodes 
          WHERE @OriginPostalCode IS NULL 
             OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters
        ) AS ProximalPostalCodes 
            ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode
        [7 more joins including full-text queries]
      WHERE
        [20 or more lines of filtering logic]
    ) AS tmp
WHERE
  LastActivityOn > @OldestUserToSearch AND
  [any of the 20 lines that has "LastActivityO"]
ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2),
  FullTextRelevance ;

SQL-Server will probably be enough clever and not execute the same code twice, but that may depend on the version you are running. The optimizer has advanced a lot from version 2000 to 2012 (and the Express or other editions may not have the same capabilities as the Standard or the Enterprise edition)


Irrelevant to the question but I think that because the LOG() function is monotonic, the:

ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2)

is equivalent to the simpler:

ORDER BY
  DATEDIFF(WEEK, LastActivityOn, @Today))
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thank you (+1). That was what I referred to in my question as an option I was aware of: "I could wrap the whole thing in another query, repeating the projection with just the added activity". I'd also generally try to avoid raising WHERE predicates to outer queries as much as possible, as SQL Server performance seems to get unpredictable on variables processed after just a few levels of subqueries. I think if a subquery is the way to go, I may be able to wrap JUST the UserRegistration and UserProfile tables at the innermost level. It won't simplify too much, since most of my columns are there. – shannon Aug 06 '12 at 11:23
  • You should examine the execution plans of the 2 (or more) rewritings on the query. What version of SQL-Server do you run? – ypercubeᵀᴹ Aug 06 '12 at 11:26
  • Currently running SQL 2008. Also, thanks RE: the log function. I'm working to combine it with relevance as a weighted total, so that's a work-in-progress. I'll either trim it with INT to use as a type of ranking, or add it to relevance with a linear adjustment. – shannon Aug 06 '12 at 11:40
  • Ah! I didn't realize I could use the alias in the order by. That's where it broke then, and where my confusion came from. I thought it broke when I added the subquery, but at the same time I attempted to use the alias to filter as well. Since that clears up my confusion, I'll probably accept this as the answer. – shannon Aug 06 '12 at 11:53
  • Interestingly, I can ORDER BY LastActivityOn and/or Relevance, but not by a dependent calculation: (LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2) - Relevance/125) – shannon Aug 06 '12 at 12:05
  • It looks like OUTER APPLY may do what I need with less code, I'm evaluating the performance now and will post the results. – shannon Aug 06 '12 at 14:12
2

I think including this join may do what I need:

OUTER APPLY (SELECT MAX(ActivityDate) LastActivityOn FROM (VALUES
    (UserRegistration.CreatedOn),
    (UserRegistration.ActivatedOn),
    (UserRegistration.LastLoginOn),
    (UserRegistration.UpdatedOn),
    (UserProfile.PostedOn)) AS AllDates(ActivityDate)) LastActivity

Also added it as conditional WHERE criteria, disabling it with a NULL parameter:

WHERE
  (@OldestUserToSearch IS NULL OR
  LastActivityOn > @OldestUserToSearch) AND

RESULTS

The performance of using this and referencing it in the SELECT was identical to a subselect on SQL Server 2008.

When I add the WHERE predicate is where things start to get hairy. The postal code radius search you can see in the original question is the heaviest part of the calculation, and it worked best at the top of the search, closest to the "TOP 175". Unfortunately the optimizer moved it 5 levels deeper into the execution plan where the distance calculation was ultimately performed against many more rows, when I reused the "OUTER APPLY" output in multiple places. The result was that the query ran about 6 times as long.

Because the performance was identical for the same shape query, and also resulted in less code (not requiring restating my projection or wrapping the entire query in a CTE or subquery), I'm going to call OUTER APPLY the answer I was looking for. Separately, if I need to force the GIS search to the outermost nested loop under all circumstances, I'll have to reformulate the query for it.

Summary of the options presented: How can I avoid repeating a calculated expression multiple times in the same select?

Some helpful similar uses for APPLY:

Local examples on subqueries and CTE (which I rejected as answers):

Unrelated/unhelpful articles with related titles:

Community
  • 1
  • 1
shannon
  • 8,664
  • 5
  • 44
  • 74