0

I am trying to summarize banner ad views from a table that is pretty good size(18,243,847 rows). I need a count for views in the past two years. I tried adding an index to the date and tried different variations of the query below. Most runs are about 25 seconds, however it seems when passed in a web service, the target page is timing out. I know the issue is with count, but have not been able to reduce that portion to lower than 11 seconds. Seems like not a lot, but why the issue with my web service? Anyway, first things first, is this query doing the best I can do?

SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate, isNull(v.viewCount,0) AS viewCount, isNull(c.clickCount,0) AS clickCount 
FROM bannerAds ba 
    LEFT OUTER JOIN (SELECT adID, count(viewID) AS viewCount 
                     FROM bannerAdsViews 
                     WHERE viewDateTime IS NOT NULL AND viewDateTime  >= DateAdd(yy, -2, GetDate())
                     GROUP BY adID) v ON ba.adID = v.adID 
    LEFT OUTER JOIN (SELECT adID, count(viewID) AS clickCount 
                     FROM bannerAdsViews 
                     WHERE clickDateTime IS NOT NULL AND viewDateTime  >= DateAdd(yy, -2, GetDate())
                     GROUP BY adID) c ON ba.adID = c.adID 
WHERE viewCount > 0
ORDER BY name ASC 
FOR XML RAW ('Banner'), ROOT ('Banners');
Connie DeCinko
  • 996
  • 5
  • 19
  • 39
  • Not an answer per say but why not create a daily summarized view of the data for each of your banners and then query that. So rather than traversing 18 million+ records you only have to look at a few hundred (depending on your date range) if you then have this update as part of a scheduled job, trigger or the point you are recording these stats then you don't have to keep working out the calculations every time you want to run this query. – David Shorthose Oct 13 '15 at 16:13
  • 1
    To optimze a query i first run an explain for the statement, otherwise you will only optimize the wrong parts. – AlexN Oct 13 '15 at 17:12

1 Answers1

2

This query can be difficult to get really good performance on. You are summarizing a lot of data.

However, two subqueries are not needed. If I make the assumption the viewID and viewDateTime are both NULL on the same records, then I think this version is equivalent:

SELECT ba.adID, ba.name, ba.description, ba.startDate, ba.endDate, 
       COALESCE(vc.viewCount, 0) as viewCount,
       COALESCE(vc.clickCount, 0) as clickCount 
FROM bannerAds ba JOIN
     (SELECT adID, count(viewDateTime) as viewCount, 
             count(clickDateTime) as clickCount
      FROM bannerAdsViews 
      WHERE viewDateTime  >= DateAdd(year, -2, GetDate())              
      GROUP BY adID
     ) vc
     ON ba.adID = v.adID 
WHERE viewCount > 0
ORDER BY name ASC 
FOR XML RAW ('Banner'), ROOT ('Banners');

The INNER JOIN can replace the LEFT JOIN, because the WHERE clause is removing NULL values anyway.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Looks good, brought it down to 10 seconds. I didn't think about combining the two counts into a single query, that seems to make a lot of sense. Is COALESCE faster or just preferred or isNull? – Connie DeCinko Oct 13 '15 at 16:09
  • Looks like `ISNULL` is a little bit faster. See http://stackoverflow.com/questions/2287642/which-is-quicker-coalesce-or-isnull and http://dba.stackexchange.com/questions/4274/performance-difference-for-coalesce-versus-isnull. – ForguesR Oct 13 '15 at 16:23
  • @ConnieDeCinko . . . I prefer `coalesce()` because it is ANSI standard and accepts more than two arguments. SQL Server has a better implementation for `isnull()` (because the first argument is not evaluated twice). For just fetching a column, the difference is negligible. When the first argument is expensive, such as a function all or subquery, `isnull()`is definitely preferred for performance reasons. – Gordon Linoff Oct 13 '15 at 21:14