0

For 14 years, I have lived by the sworn reference "Never use a UDF on more than one or two rows being affected."

I was about to share just how bad UDFs are with my team, by using a very basic example:

/*
CREATE FUNCTION dbo.Test (@Numerator Float, @Denominator Float)
RETURNS Float AS
BEGIN
DECLARE @Return Float;
SET @Return = @Numerator / NullIf(@Denominator,0);
RETURN @Return
END
GO
*/
----------------------------------------------------------------------
SELECT rn / NullIf(3.00,0)
FROM
       (
       SELECT TOP 1000000 rn = Convert(Float,ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
       FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
       ORDER BY s1.[object_id]
       ) tbl;
----------------------------------------------------------------------
SELECT dbo.Test (rn,3.00)
FROM
       (
       SELECT TOP 1000000 rn = Convert(Float,ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
       FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
       ORDER BY s1.[object_id]
       ) tbl;
----------------------------------------------------------------------
--DROP FUNCTION dbo.Test;
GO

However, when I was ready to do the show-n-tell, the length of time it took to run the first block and the second block were almost the exact same, about 8 seconds.I remember running a very similar experiment about 6 years ago, and noticed an exponential difference in time to execute.

Have UDFs become safe to use now? What's changed?

GMB
  • 216,147
  • 25
  • 84
  • 135
Gregory Hart
  • 137
  • 1
  • 7
  • 2
    UDFs are not evil. Like all code, they can be written poorly or in a less than optimal fashion. For example, your UDF has multiple statements when it can be reduced to one >>> RETURN @Numerator/NullIf(@Denominator,0) << – John Cappelletti Oct 15 '19 at 22:43
  • 2
    Like all tools they have pros and cons. But like all things SQL, performance is totally situational. Avoiding them in a `where` clause might be a valid rule, but then do you want to repeat your SQL in multiple locations and run the risk of forgetting to update one of them. In many situations a UDF can be replaced with an ITVF which has none of the typical UDF bad performance as its inserted inline like a view. – Dale K Oct 15 '19 at 22:51
  • I was aware of the single statement of just doing a straight "RETURN", as that was my first approach. When it returned quickly, I exaggerated it to mimic most other UDFs, of the most basic. It still returned quickly, hence, my decision was to post the more complex version to more further prove the point that performance of simple UDFs has significantly improved (or, ...maybe my memory is really bad). – Gregory Hart Oct 15 '19 at 23:07
  • I'm sure every part of SQL Server will be performing better 14 years on :) – Dale K Oct 15 '19 at 23:09
  • I respectfully disagree with John's assertion that UDFs are not evil. To prove my point I challenge anyone to fill a table with say, 100,000 numbers then write a query that returns all the values and adds 1 to them. Now write a scalar udf that does the same (adds 1 to the value) Note how much slower the query gets. – Alan Burstein Oct 15 '19 at 23:41
  • Alan Burstein - Your request is exactly what my function above is (with the change of calling one million rows instead of only 100,000). It is the most simplistic, short of not declaring the variable, but merely setting a return value. The question is, "What changed?" – Gregory Hart Oct 16 '19 at 00:28
  • 2
    @AlanBurstein My assertion stands. Every tool has a value when utilized correctly. Firstly, I suspect most rational people would never use a UDF as challenged, and if they did, they would quickly realize it was a silly idea. Secondly, performance is important, but it NOT EVERYTHING. Data integrity, manageability, usability, ... should also be considered. For example, UDFs can increase consistency by reducing TOUCH POINTS. A client added a risk rating. They consumed thousands of man hours identifying, adjusting and validating countless legacy reports and processes. I had one touch point. – John Cappelletti Oct 16 '19 at 01:51

1 Answers1

1

If by "safe" you mean "not slow and/or inefficient" then the answer is: Inline = good, not inline = bad.* Don't take my work for it, just ask Microsoft:

https://learn.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=sql-server-ver15#performance-of-scalar-udfs

T-SQL Inline functions can be slow when poorly written but T-SQL Scalar UDFs and multi-statement table valued functions (mTVF) are guaranteed to be performance destroyers.

Here's a couple links where I put together some tests and or discuss this in more detail.

When would you use a table-valued function?

https://www.sqlservercentral.com/forums/topic/inline-vs-multi-statement-table-valued-function-temp-tables-vs-ctes#post-1784973

Now, SQL Server 2019 introduces inline scalar and mTVFs. These are better than their predecessors but aren't perfect. For example, using Scalar UDFs as CHECK Constraints or computed columns will force the optimizer to choose a serial plan with queries that reference that table (insert/update/delete AND SELECTs). This is true even when the column which references the function is not called in the query. This is still true in SQL 2019.

Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
  • 1
    I remember reading somewhere that inline functions have better performance because the query optimizer can use them as if they are a part of the code that use them, meaning that the optimization process is done on the entire code that actually runs, but that's not the case for non-inline functions. I like the orwellian reference, but that's not the only reason I've upvoted. – Zohar Peled Oct 16 '19 at 07:04