2

Backround

This question is a follow-up to a previous question. To give you context here as well, I would like to summarize the previous question: in my previous question I intended to have a methodology to execute selections without sending their result to the client. The goal was to measure performance without eating up a lot of resources by sending millions of data. I am only interested in the time needed to execute those queries and not in the time they will send the results to the client app, since I intend to optimize queries, so the results of the queries will not change at all, but the methodology will change and I intend to be able to compare the methodologies.

Current knowledge

In my other question several ideas were presented. An idea was to select the count of the records and put it into a variable. However, that changed the query plan significantly and the results were not accurate in terms of performance. The idea of using a temporary table was presented as well, but creating a temporary table and inserting into it is difficult if we do not know what query will be our input to measure and also introduces a lot of white noise, so, even though the idea was creative, it was not ideal for my problem. Finally Vladimir Baranov came with an idea to create as many variables as many columns the selection will return. This was a great idea, but I refined it further, by creating a single variable of nvarchar(max) and selecting all my columns into it. The idea works great, except for a few problems. I have the solution for most of the problems, but I would like to share them, so, I will describe them regardless, but do not misunderstand me, I have a single question.

Problem1

If I have a @container variable and I do a @container = columnname inside each selection, then I will have conversion problems.

Solution1

Instead of just doing a @container = columnname, I need to do a @container = cast(columnname as nvarchar(max))

Problem2

I will need to convert <whatever> as something into @container = cast(<whatever> as nvarchar(max)) for each columns in the selection, but not for subselections and I will need to have a general solution handling case when and parantheses, I do not want to have any instances of @container = anywhere, except to the left of the main selection.

Solution2

Since I am clueless about regular expressions, I can solve this by iterating the query string until I find the from of the main query and each time I find a parantheses, I will do nothing until that parantheses is closed, find the indexes where @container = should be put and as [customname] should be taken out and from right to left do all that in the query string. This will be a long and unelegant code.

Question

Is it possible to make sure that all my main columns but nothing else start with @container = and ends without as [Customname]?

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • To ensure the execution plan doesn't change, consider executing the proc on the database server from SSMS with the discard query results option or from Powershell using ExecuteNonQuery. Although SQL Server will still return results, only API processing time will be considered (no network, rendering, or other processing). – Dan Guzman Sep 08 '17 at 16:59
  • @DanGuzman if I have \@container = foo instead of foo as [bar], is the query plan possibly changed? – Lajos Arpad Sep 08 '17 at 17:01
  • If the data type of the source and target don't match exactly, a conversion will be needed (explicit or implicit) and that will affect performance. I can't say I've run into a case of a different plan with matching types but I seldom use a SELECT for variable assignments when more than one row is returned, It might not work in all cases since the behavior is undefined, or the behavior might change depending on SQL version, etc. Do so at your own risk. – Dan Guzman Sep 08 '17 at 17:24
  • @DanGuzman if I understand you correctly, you are saying that it is possible that the measurement will not be accurate due to the variable assignment and conversion and you are suggesting that an ExecuteNonQuery should be executed. I need the performances to be calculated on SQL Server-level and the values to be returned from there. If I call an ExecuteNonQuery, then I will need to do the measurementson either the Powershell level or the app level. – Lajos Arpad Sep 08 '17 at 17:32
  • You are correct - I'm suggesting you measure time at the client API level (sans client rendering/processing time). I know what you are asking but I don't think that is possible to accomplish reliably. Also, consider times should include work done by the database engine to fill TDS protocol result buffers, which are part of what will happen when called from app code.. – Dan Guzman Sep 08 '17 at 18:46
  • @DanGuzman I understand your point. I prefer to do several measures in a single request and return only the measurement results, if possible. If not, then I will need to send a separate request for each measurement, measure them separately and accept the white noise due to message sending and receiving. – Lajos Arpad Sep 09 '17 at 13:31
  • @LajosArpad, why not write a CLR function that calls `ExecuteNonQuery`? It will be executed locally on the server, no network traffic is involved. You can put whatever complicated logic you need in the CLR code and the "noise" would be brought to the minimum. – Vladimir Baranov Sep 09 '17 at 14:13
  • @VladimirBaranov I would like to execute several measurements inside a single request and to avoid the noise instead of bringing it to a minimum. – Lajos Arpad Sep 09 '17 at 14:14
  • BTW, your goal of accurate measurement of an abstract query is practically impossible to achieve. Is your server on which your measured queries run idle? Does it do anything other than measuring several queries run one after another? Performance of a query may be severely affected by other queries that run at the same time. You know, mutual locking and waiting for the common resource. To have a meaningful measure of the performance you need to reproduce the production load on the server. It is not trivial. – Vladimir Baranov Sep 09 '17 at 14:15
  • @VladimirBaranov it is not trivial at all, but since I know what were the queries at a given point, I can more-or less reproduce something very similar as the state which occurred. Also, repeated measurements will create a powerful hypothesis, which, even if unproven, will have a high chance of correctness. The problem is difficult and complicated, but not unsolvable. My question is about a concrete improvement on my current approach, which will not lead to a final version, but to a version which is much better than nothing. – Lajos Arpad Sep 09 '17 at 14:18
  • @VladimirBaranov total accuracy is unachievable, I agree. But we can approach that state to some degree. – Lajos Arpad Sep 09 '17 at 14:19

2 Answers2

2

This is much too long for a comment but I'd like to add my $.02 to the other answers and share the scripts I used to test the suggested methods.

I like @MartinSmith's TOP 0 solution but am concerned that it could result in a different execution plan shape in some cases. I didn't see this in the tests I ran but I think you'll need to verify the plan is about the same as the unmolested query for each query you test. However, the results of my tests suggest the number of columns and/or data types might skew performance with this method.

The SQLCLR method in @VladimirBaranov's answer should provide the exact plan as the app code generates (assuming identical SET options for the test) there will still be some slight overhead (YMMV) with SqlClient consuming results within the SQLCLR. There will be less server overhead with this method compared to returning results back to the calling application.

The SSMS discard results method I suggested in my first comment will incur more overhead than the other methods but does include the server-side work SQL Server will perform not only in running the query, but also filling buffers for the returned result. Whether or not this additional SQL Server work should be taken into account depends on the purpose of the test. For unit-level performance tests, I prefer to execute tests using the same API as the app code.

I captured server-side performance with these 3 methods with @MartinSmith's original query. The average of 1,000 iterations on my machine were:

test method     cpu_time      duration    logical_reads
SSMS discard   53031.000000  55358.844000  7190.512000
TOP 0          52374.000000  52432.936000  7190.527000
SQLCLR         49110.000000  48838.532000  7190.578000

I did the same with a trivial query returning 10,000 rows and 2 columns (int and nvarchar(100)) from a user table:

test method     cpu_time      duration    logical_reads
SSMS discard    4204.000000  9245.426000   402.004000
TOP 0           2641.000000  2752.695000   402.008000
SQLCLR          1921.000000  1878.579000   402.000000

Repeating the same test but with a varchar(100) column instead of nvarchar(100):

test method     cpu_time      duration    logical_reads
SSMS discard    3078.000000  5901.023000   402.004000
TOP 0           2672.000000  2616.359000   402.008000
SQLCLR          1750.000000  1798.098000   402.000000

Below are the scripts I used for testing:

Source code for SQLCLR proc like @VladimirBaranov suggested:

public static void ExecuteNonQuery(string sql)
{
    using (var connection = new SqlConnection("Context Connection=true"))
    {
        connection.Open();
        var command = new SqlCommand(sql, connection);
        command.ExecuteNonQuery();
    }
}

Xe trace to capture the actual server-side timings and resource usage:

CREATE EVENT SESSION [test] ON SERVER 
ADD EVENT sqlserver.sql_batch_completed(SET collect_batch_text=(1))
ADD TARGET package0.event_file(SET filename=N'QueryTimes')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO

User table create and load:

CREATE TABLE dbo.Foo(
      FooID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY
    , Bar1 nvarchar(100)
    , Bar2 varchar(100)
    );
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Foo WITH (TABLOCKX)
SELECT num, REPLICATE(N'X', 100), REPLICATE('X', 100)
FROM t10k;
GO

SQL script run from SSMS with the discard results query option to run 1000 iterations of the test with the 3 different methods:

SET NOCOUNT ON;
GO

--return and discard results
SELECT v.*,
       o.name
FROM   master..spt_values AS v
       JOIN sys.objects o
         ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000

--TOP 0
DECLARE @X NVARCHAR(MAX);

SELECT @X = (SELECT TOP 0 v.*,
                          o.name
             FOR XML PATH(''))
FROM   master..spt_values AS v
       JOIN sys.objects o
         ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000

--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery @sql = N'
SELECT v.*,
       o.name
FROM   master..spt_values AS v
       JOIN sys.objects o
         ON o.object_id % NULLIF(v.number, 0) = 0;
'
GO 1000

--return and discard results
SELECT FooID, Bar1
FROM   dbo.Foo;
GO 1000

--TOP 0
DECLARE @X NVARCHAR(MAX);

SELECT @X = (SELECT TOP 0 FooID, Bar1
             FOR XML PATH(''))
FROM   dbo.Foo;
GO 1000

--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery @sql = N'
SELECT FooID, Bar1
FROM   dbo.Foo
';
GO 1000

--return and discard results
SELECT FooID, Bar1
FROM   dbo.Foo;
GO 1000

--TOP 0
DECLARE @X NVARCHAR(MAX);

SELECT @X = (SELECT TOP 0 FooID, Bar2
             FOR XML PATH(''))
FROM   dbo.Foo;
GO 1000

--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery @sql = N'
SELECT FooID, Bar2
FROM   dbo.Foo
';
GO 1000
Dan Guzman
  • 43,250
  • 3
  • 46
  • 71
1

I would try to write a single CLR function that runs as many queries as needed to measure. It may have a parameter with the text(s) of queries to run, or names of stored procedures to run.

You have a single request to the server. Everything is done locally on the server. No network overhead. You discard query result in the .NET CLR code without using explicit temp tables by using ExecuteNonQuery for each query that you need to measure.

Don't change the query that you are measuring. Optimizer is complex, changes to the query may have various effects on the performance.

Also, use SET STATISTICS TIME ON and let the server measure the time for you. Fetch what the server has to say, parse it and send it back in the format that suits you.

I think, that results of SET STATISTICS TIME ON / OFF are the most reliable and accurate and have the least amount of noise.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90