3

I have been researching SQL Server CLR UDFs and parallelism for some time. The general consensus seems to be that in SQL Server 2008 and later, a scalar value CLR UDF with DataAccessKind.None should allow parallel execution.

However, when I use my scalar value UDF in my view in SQL Server 2012, it still kills parallel execution in joins and the like.

Is there something special I need to add to my C# code or the T-SQL UDF definition to indicate that it is safe for parallel execution?

Thanks.

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
Neil Weicher
  • 2,370
  • 6
  • 34
  • 56
  • At a moderator's recommendation, I am providing a link to an earlier thread I posted on the subject: http://social.msdn.microsoft.com/Forums/sqlserver/en-US/499731fa-c99b-4db3-80cf-7059edc52ed6/best-practices-for-maximum-performance-of-c-clr-scalar-udf-in-view?forum=sqlnetfx – Neil Weicher Jul 13 '14 at 14:13
  • Are you doing anything in the UDF that would prevent it from being considered deterministic? If no, then are you specifying `IsDeterministic = true` in the `SqlFunction()` attribute? `IsDeterministic` is false by default. Same goes for `IsPrecise`. – Solomon Rutzky Jul 13 '14 at 14:47
  • If setting `IsDeterministic` and/or `IsPrecise` do the trick (assuming that you are not doing anything in the function to prevent setting those to `true`), then I will post an answer here as well as follow up on that MSDN forum. – Solomon Rutzky Jul 13 '14 at 14:58
  • I'm pretty sure my UDF is Deterministic. Can you give me an example of something that is not Deterministic? (Besides, say, RAND(), @@IDLE, etc) – Neil Weicher Jul 14 '14 at 13:26
  • determinism means that the same inputs are _guaranteed_ to have the same output. So if your function will always return the same value for a particular set of inputs, then it is deterministic and should be marked as `IsDeterministic = true`. You can set `IsPrecise = true` if you are not using any floating point (i.e. Double or Single) values. – Solomon Rutzky Jul 14 '14 at 14:09
  • I uploaded the estimated plans here for an inner join: www.netlib.com/private/sqlplans.zip. It includes: * Raw.sqlplan (no UDF) in View * NonDeterministic.sqlplan (IsDeterministic=false) * Deterministic.sqlplan (IsDeterministic=true) Using IsPrecise=true did not effect the plan, although it does not use any floating point. I am working on the actual plans but it is a huge database and the queries take a long time. – Neil Weicher Jul 14 '14 at 18:22
  • We need to wait to see what the actual plans show (at least between Raw and Deterministic), but just from looking at the estimated plans it looks like marking the function as `IsDeterministic=true` has made a world of difference. – Solomon Rutzky Jul 14 '14 at 19:20
  • I uploaded http://www.netlib.com/private/realplans.zip. It contains: Deterministic.real.sqlplan and Raw.real.sqlplan. Interestingly adding IsPrecise seemed to improve performance by about 5% even though the plans were identical. But maybe that was a statistical anomaly. Even though D is light years better than ND, Raw is still about 2000% faster than D. Is that just the CLR interface switch? All the CLR API is doing is returning the input string verbatim. – Neil Weicher Jul 15 '14 at 18:59
  • `IsDeterministic` & `IsPrecise` are meta-data for the optimizer, which is why it is best to set them to true (assuming that accurately reflects the code). So I don't think it was a statistical anomaly. Why do you say that Raw is better than D? According to the "real" plans, a) both are parallel (DOP = 2), and b) the D plan has less estimated cost (63 vs 89) and lower memory grant. If "Raw" means no function at all then that is nearly always faster than having any type of function. How long did each query run for? Still, looks like I was right about how to get a CLR UDF to go parallel :). – Solomon Rutzky Jul 15 '14 at 19:37
  • Also, do you have the input parameter and return value defined as NVARCHAR(MAX) or NVARCHAR(1 - 4000)? If you are curious about performance, I did a study and published it here: https://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/ – Solomon Rutzky Jul 15 '14 at 19:59
  • Yes, your tip about IsDeterministic was an enormous help. Regarding the input parameter: Yes, the parameter and return are defined as NVARCHAR(4000). The Raw (no API in view) query took about 3 seconds, while the query with the D API plan took about 60 seconds. (I terminated the query with the ND API after a couple of hours!) As I mentioned, all the test API is doing is returning the input SqlString as-is. Where are the other 57 seconds being spent? I ran each test a handful of times (after dropping clean buffers) and the results were pretty consistent. – Neil Weicher Jul 15 '14 at 21:02
  • I am not certain about the exact inner workings of the API, but I don't think there is anything you can do about that. However, the Function is in a View that you are selecting from, so likely that 57 seconds is spread across all of the rows that the View pulls back prior to any filters that can be applied by the Optimizer coming from the outer query. – Solomon Rutzky Jul 15 '14 at 21:27
  • This is the UDF. All it does is return the input value: `[Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.None, IsDeterministic=true, IsPrecise=true)] public static SqlString MyUDF(SqlString data) { return data; }` – Neil Weicher Jul 17 '14 at 09:01
  • @srutzky - so it sounds like you are saying that it is due to the overhead of making the transition to CLR? Can you think of anything else I might be able to look at to speed it up? The UDF is wrapped around the column that is the primary key of the table (varchar). – Neil Weicher Jul 17 '14 at 09:04
  • Yes, there is some overhead, though I'm not certain how much. There might be some things that we can try to increase the performance, but that is a different question than this one which is specific to CLR UDFs and parallelism. I think I have sufficiently answered this particular question given that your actual execution plan shows the CLR UDF in a parallel plan. I suggest upvoting / mark-as-answered this one and opening a new question specific to the performance of the current code and we can have that discussion / investigation there. Please add a comment here linking to the new one. – Solomon Rutzky Jul 18 '14 at 03:06
  • 1
    @srutzky - I was thinking the same thing. I will create a new thread and post the link here. Thanks! – Neil Weicher Jul 19 '14 at 14:36
  • @srutzky - here is the followup: http://social.msdn.microsoft.com/Forums/sqlserver/en-US/1afac121-3490-436e-bbb2-34bcf3f71962/improving-clr-performance-in-sql-server-redux?forum=sqlnetfx – Neil Weicher Jul 22 '14 at 15:22

2 Answers2

4

According to the MSDN forum that is linked to in the first comment on the question, your C# code roughly starts out as:

 [Microsoft.SqlServer.Server.SqlFunction()]
    public static SqlString MyUDF(SqlString data)

and according to the question you have added DataAccessKind.None making it:

 [Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None)]
    public static SqlString MyUDF(SqlString data)

To start with, both DataAccess and SystemDataAccess default to DataAccessKind.None, so setting them explicitly to DataAccessKind.None, while a good practice, shouldn't have made any noticeable difference.

There are two other properties that need to be set: IsDeterministic and IsPrecise. These properties are meta-data that the Query Optimizer uses and are both false by default. Hence, it is best to set one or both of them to true (assuming, of course, that the settings accurately reflect the code within that particular function).

  • Determinism means that the same inputs are guaranteed to have the same output. So if your function will always return the same value for a particular set of inputs, then it is deterministic and should be marked as IsDeterministic = true.
  • You can set IsPrecise = true if you are not using any floating point (i.e. Double or Single) values (i.e. FLOAT or REAL in T-SQL terms).

The SqlFunction attribute should look as follows:

[Microsoft.SqlServer.Server.SqlFunction(SystemDataAccess = DataAccessKind.None,
   DataAccess = DataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
   public static SqlString MyUDF(SqlString data)

UPDATE:

  • One additional item that is probably required is that the assembly containing this method has a PERMISSION_SET of SAFE.
  • It is probably not required to have the IsPrecise property of the SqlFunction attribute set to true in order to get the UDF to work in a parallel execution plan.
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
1

It might not be a problem with the CLR UDF but with SQL Server's query plan decision. You can force a parallel plan by using OPTION (QUERYTRACEON 8649) as explained here.

Community
  • 1
  • 1
Michael Ganß
  • 2,846
  • 1
  • 18
  • 11
  • Hi Michael. IF the CLR UDF is marked as both `DataAccessKind.None` (or at least _not_ marked as `DataAccessKind.Read` since `None` is the default) AND `IsDeterministic = true` and still never gets a parallel plan, then it might be something else and in that case the TraceFlag might help. But the UDF does need to be marked as `IsDeterministic = true`. – Solomon Rutzky Jun 08 '15 at 14:11