4

I have made a simple, but relatively computationally complex, UDF that queries a rarely changing table. In typical usage this function is called many many times from a WHERE clauses over a very small domain of parameters.

What can I do to make my usage of the UDF faster? My thoughts are that there should be some way to tell SQL Server that my function returns the same result with the same parameters and thus should be memoized. There doesn't seem a way to do it within the UDF because they are required to be pure and thus can't write to a temp table.

For completeness my UDF is below, though I am seeking a general answer on how to make calling UDFs on small domains faster, and not how to optimize this particular UDF.

CREATE function [dbo].[WorkDay] (
    @inputDate datetime, 
    @offset int) 
returns datetime as begin

declare 
     @result datetime 

set @result = @inputDate

while @offset != 0
begin
    set @result = dateadd( day, sign(@offset), @result )

    while ( DATEPART(weekday, @result ) not between 2 and 6 )
      or @result in (select date from myDB.dbo.holidays
      where calendar = 'US' and date = @result)
    begin
        set @result = dateadd( day, sign(@offset), @result )
    end
    set @offset = @offset - sign(@offset)
end
return @result

END
Roger
  • 398
  • 1
  • 4
  • 11
  • 2
    You can use a calendar table - it is going to be much more efficient than all of this looping in a multi-statement scalar UDF. http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html – Aaron Bertrand Feb 22 '12 at 21:06
  • You can't really. It will always be called per row. If possible don't use scalar UDFs (especially ones that do data access as yours does) and find another way. Where possible converting to an inline TVF can sometimes help. – Martin Smith Feb 22 '12 at 21:15
  • @MartinSmith, reading up on inline TVFs now. – Roger Feb 22 '12 at 22:37
  • @Roger - Hopefully you found [Scalar functions, inlining, and performance: An entertaining title for a boring post](http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx) – Martin Smith Feb 22 '12 at 22:39
  • @MartinSmith, read up on TVF. Unfortunately my UDFs need something like T-SQL and would be awkward as pure SQL queries. – Roger Feb 22 '12 at 22:53
  • @AaronBertrand, I have read that link but I have issues with such a solution. When I want to lookup a date a certain offset from today, I have to look up the index for today, increment it by the offset, and look up the resultant date. This means I have to join the calendar table twice for each date I want an offset of. I can see my queries getting very complicated quickly if I need to do this more than once. Ideas? – Roger Feb 22 '12 at 23:25
  • I don't think you have to join the calendar table twice, but I'm having a hard time figuring out how to explain the approach without understanding the given data point and the desired result. Can you add a few examples of input and expected output, including information like weekdays and holidays that need to be skipped? – Aaron Bertrand Feb 22 '12 at 23:28

2 Answers2

3

My first thought here is -- what's the performance problem? Sure you have a loop (once per row to apply where) within a loop that it runs a query. But are you getting poor execution plans? Are your result sets huge? But lets turn to the generic. How does once solve this problem? SQL doesn't really do memoization(as the illustrious @Martin_Smith points out). So what's a boy to do?

Option 1 - New Design

Create an entirely new design. In this specific case @Aaron_Bertrand points out that a calendar table may meet your needs. Quite right. This doesn't really help you with non calendar situations, but as is often the case in SQL you need to think a bit different.

Option 2 - Call the UDF Less

Narrow the set of items that call this function. This reminds me a lot of how to do successful paging/row counting. Generate a small result set that has the distinct values required and then call your UDF so it is only called a few times. This may or may not be an option, but can work in many scenarios.

Option 3 - Dynamic UDF

I'll probably get booed out of the room for this suggestion, but here goes. What makes this UDF slow is the select statement inside the loop. If your Holiday table really changes infrequently you could put a trigger on the table. The trigger would write out and updated UDF. The new UDF could brute force all the holiday decisions. Would it bit a bit like cannibalism with SQL writing SQL? Sure. But it would get rid of the sub-query and speed the UDF up. Let the heckling begin.

Option 4 - Memoize It!

While SQL can't directly memoize, we do have SQL CLR. Convert the UDF to a SQL CLR udf. In CLR you get to use static variables. You could easily grab the Holidays table at some regular interval and store them in a hashtable. Then just rewrite your loop in the CLR. You could even go further and memoize the entire answer if that's appropriate logic.


Update:

Option 1 - I was really trying to focus on the general here, not the example function you used above. However, the current design of your UDF allows for multiple calls to the Holiday table if you happen to hit a few in a row. Using some sort of calendar-style-table that contains a list of 'bad days' and the corresponding 'next business day' will allow you to remove the potential for multiple hits & queries.

Option 3 - While the domain is unknown ahead of time you could very well modify your holiday table. For a given holiday day it would contain the next corresponding work day. From this data you could spit out a UDF with a long case statement (when '5/5/2012' then '5/14/2012' or something similar) at the bottom. This strategy may not work for every type of problem, but could work well for some types of problems.

Option 4 - There are implications to every technology. CLR needs to be deployed, the SQL Server configuration modified and SQL CLR is limited to the 3.5 framework. Personally, I've found these adjustments easy enough, but your situation may be different (say a recalcitrant DBA, or restrictions on modifications to production servers).

Using static variables requires the assemblies be granted FULL TRUST. You'll have to make sure you get your locking correct.

There is some evidence that at very high transaction levels CLR doesn't perform as well as direct SQL. In your scenario, however, this observation might not be applicable because there isn't a direct SQL correlary for what your trying to do (memoize).

Community
  • 1
  • 1
EBarr
  • 11,826
  • 7
  • 63
  • 85
  • Option 1: While 'think different' is true in general for SQL problems, it can't be used as a cookie cutter approach to optimizing the common use case of UDF that I described. The Calendar table also doesn't solve my issue. Option 2: This works and is what I am doing now. My question arose when I was doing it and thinking 'there has got to be a simpler way'. Option 3: Suffers from the same pitfall as @Jeff Cuscutis's solution, the domain of my UDF while small may be unpredictable. Option 4: My intuition says Bingo, but I have never approached SQL CLR. Are there dangerous implications of this? – Roger Feb 22 '12 at 23:16
1

You could write to a real table keyed off of your params and select for that first and if that comes up null, then calculate and insert into the table doing your own caching.

It might make more sense to pre-fill a table with all possible values for the date range you are interested in and then just join to that. you are then only doing the calculation once for each combination of params and letting SQL handle the join.

Jeff Cuscutis
  • 11,317
  • 6
  • 28
  • 21
  • The issue with the approach is that while the domain I use in practice is small, it is unknown ahead of time and could potentially be huge. In the case of my UDF for every date there could be 2^32 unique offsets. – Roger Feb 22 '12 at 22:34
  • 1
    SQL Server is pretty good at math. Counting rows in a table between date and date + x should be pretty easy regardless of how many billions of different offsets you might expect. I strongly recommend you give the approach a try instead of assuming that it won't be faster than what you're doing now (which is NOT what SQL Server is good at). – Aaron Bertrand Feb 23 '12 at 00:17