T-SQL UDF vs full expression run-time

Question

I'm trying to make my query readable by using UDF in SQL SERVER but the run time increasing dramatically when using the function.

Following is the function I'm using:

create function DL.trim_all(@input varchar(max)) 
returns varchar(max)
as begin 
    set @input=replace(replace(replace(@input,' ',''),')',''),'(','')
    return @input
end

Instead of writing:

SELECT
CASE WHEN replace(replace(replace([FULL_NAME_1],' ',''),')',''),'(','')=replace(replace(replace([FULL_NAME_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [name_match],
CASE WHEN replace(replace(replace([ADDRESS_1],' ',''),')',''),'(','')=replace(replace(replace([ADDRESS_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [adrs_match]
.
.
.
FROM
TABLE_1

for 20 different fields.

When using the function I'm getting run-time of 12.5 minutes while run-time of 45 seconds when not using the function.

Any ideas?

@GordonLinoff I'll appreciate a little more details and if there is any way to overcome the issue? otherwise, what is the point in UDF? — Adirmola, Apr 10 '19 at 12:55
The point is to serve as insidious traps and valuable learning experiences for innocent developers... but seriously, scalar UDFs are perfectly usable -- as long as you never use them in any query processing more than a handful of rows. Search "scalar udf performance" in any search engine of your choice to find many references on this topic (as well as the improvements made in SQL Server 2019, which still do not resolve all the issues). The workaround, if you absolutely want a function, is to use inline table-valued functions or CLR functions, which don't have these problems. — Jeroen Mostert, Apr 10 '19 at 13:08
@JeroenMostert You are correct, but I suspect diminishing returns using a TVF on 20 columns. — John Cappelletti, Apr 10 '19 at 13:10
@JohnCappelletti: compared to a bare expression, certainly, but even just the fact that inline TVFs don't inhibit parallelism can be the difference between "unworkably slow" and "slower, but at least something I can live with". — Jeroen Mostert, Apr 10 '19 at 13:12
One more thing that might improve performance is to stop using `varchar(max)` and use a reasonable length instead (that is, unless you suspect strings with more than 8000 chars in them) — Zohar Peled, Apr 10 '19 at 15:42

score 2 · Answer 1 · edited Apr 10 '19 at 16:06

Taking John's idea one step further, converting the scalar function into an inline table function and using cross apply to activate it for each pair of columns - you might get an even better performance, for the price of a more cumbersome query:

CREATE function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns table -- returns a table with a single row and a single column
as return 
  SELECT 
    CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','') = 
              replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END As IsMatch;

and the query:

SELECT NameMatch.IsMatch AS [name_match],
       AddressMatch.IsMatch AS adrs_match
.
.
.
FROM TABLE_1
CROSS APPLY DL.DoesItMatch(FULL_NAME_1, FULL_NAME_2) As NameMatch
CROSS APPLY DL.DoesItMatch(ADDRESS_1, ADDRESS_2) As AddressMatch

I may test this later. As I mentioned in the comments above, I'm not sure 20 cross apply would be any more performant. Either way +1 from me as well — John Cappelletti, Apr 10 '19 at 16:13

score 1 · Answer 2 · answered Apr 10 '19 at 13:08

1

Can't imagine a huge boost, but how about an alternate approach

create function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns bit
as begin 
    return CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','')=replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END
end

Then call the function as:

SELECT 
      DL.DoesItMatch([FULL_NAME_1],[FULL_NAME_2])  AS [name_match],
      ...
FROM
TABLE_1

answered Apr 10 '19 at 13:08

John Cappelletti

79,615
7
44
66

This should improve performance if only for the fact that it executes the UDF half the times than the original query... – Zohar Peled Apr 10 '19 at 15:44
@ZoharPeled My thoughts as well. – John Cappelletti Apr 10 '19 at 15:46
2

Sorry for riding on your idea, hope the upvote can at least make up for it :-) – Zohar Peled Apr 10 '19 at 15:54

score 1 · Answer 3 · answered Apr 10 '19 at 17:59

Inlining is always the way to go. Period. Even without considering the parallelism inhibiting aspects of T-SQL scalar UDFs - ITVFs are faster, require less resources (CPU, Memory and IO), easier to maintain and easier troubleshoot/analyze/profile/trace. For fun I put together a performance test comparing Zohar's ITVF to John's scalar UDF. I created 250K rows, tested a basic select against both, then another test with an ORDER BY against the heap to force a sort.

Sample data:

-- Sample Data
BEGIN
  SET NOCOUNT ON;
  IF OBJECT_ID('tempdb..#tmp','U') IS NOT NULL DROP TABLE #tmp;
  SELECT TOP (250000) col1 = '('+LEFT(NEWID(),10)+')', col2 = '('+LEFT(NEWID(),10)+')'
  INTO    #tmp
  FROM   sys.all_columns a, sys.all_columns;

  UPDATE #tmp SET col1 = col2 WHERE LEFT(col1,2) = LEFT(col2,2) 
END

Performance Test:

PRINT 'scalar, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3    

PRINT CHAR(10)+'scalar, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t
  ORDER BY DL.DoesItMatch(t.col1,t.col2);
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f
  ORDER BY    f.isMatch;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

Test Results:

scalar, no sort
------------------------------------------------------------
Beginning execution loop
844
843
840
Batch execution completed 3 times.

ITVF, no sort
------------------------------------------------------------
Beginning execution loop
270
270
270
Batch execution completed 3 times.

scalar, sorted set
------------------------------------------------------------
Beginning execution loop
937
930
936
Batch execution completed 3 times.

ITVF, sorted set
------------------------------------------------------------
Beginning execution loop
196
190
190
Batch execution completed 3 times.

So, when no parallel plan is needed, the ITVF is 3X faster, when a parallel plan is required it's 5X faster. Here's a few other links where I have tested ITVF vs (scalar and Multistatement Table Valued UDFs).

Set based plan runs slower than scalar valued function with many conditions

SQL Server user defined function to calculate age bracket

Function is slow but query runs fast

Why does SQL Server say this function is nondeterministic?

Grouping based on the match percentage

SQL Server 2008 user defined function to add spaces between each digit Sql table comma separated values contain any of variable values checking

SQL String manipulation, find all permutations

Karthik · Answer 4 · 2019-04-10T17:08:21.550

You could use Scalar UDF inlining in SQL Server 2019. With that, you will be able to retain the same UDF that you have written, and automatically get the performance identical to the query without the UDF.

The UDF you have given fits the criteria for inlineability so you are in good shape. Documentation about the UDF inlining feature is here: https://learn.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=azuresqldb-current

Pro tip: I'd suggest that you make a make a minor modification to your UDF before using Scalar UDF inlining. Make it into a single statement scalar UDF by avoiding the local variable. With this, you should be better off than using an inline TVF with cross apply.

T-SQL UDF vs full expression run-time

4 Answers4

Linked