2

We're experiencing performance issues in our SQL Server application (written in PHP, but we experience the same timings when running the queries in the management studio, so I don't think that's releveant).

The offending query is as follow:

SELECT
    c.name, t.name AS type, c.is_nullable, c.is_identity, 
    object_definition(c.default_object_id) AS default_value, 
    c.precision, c.scale, c.max_length, c.collation_name,
    CASE WHEN p.column_id IS NOT NULL THEN 1 ELSE 0 END AS is_primary, 
    CASE WHEN u.column_id IS NOT NULL THEN 1 ELSE 0 END AS is_unique 
FROM
    sys.columns AS c
    LEFT JOIN sys.types AS t
        ON c.user_type_id = t.user_type_id
    LEFT JOIN (
        SELECT DISTINCT
            ic.object_id, ic.column_id
            FROM sys.indexes ix
            JOIN sys.index_columns ic
                ON ix.object_id = ic.object_id
                AND ix.index_id = ic.index_id
            WHERE is_primary_key = 1
        ) AS p
        ON p.object_id = c.object_id AND p.column_id = c.column_id
    LEFT JOIN (
        SELECT DISTINCT
            ic.object_id, ic.column_id
            FROM sys.indexes ix
            JOIN sys.index_columns ic
                ON ix.object_id = ic.object_id
                AND ix.index_id = ic.index_id
            WHERE is_unique = 1
        ) AS u
        ON u.object_id = c.object_id AND u.column_id = c.column_id
WHERE
    c.object_id = object_id('tblTestTable');

Testing locally on SQL Server 2014 Express we get a first-run time of about 0.3s with subsequent runs between 0.1s and 0.2s. On our production server, running the full version of SQL Server 2014, the performance is worse!

I would expect a query such as this (using the system tables) to be running much faster, e.g. in the 0.01 -> 0.05 range, which is the kind of performance we get for similar queries against our own user tables.

  1. Should I expect a query like this to be slow?
  2. If yes, is there an alternative, faster method of getting this information? If no, what should we do to optimise it?

I notice that these system views don't appear to have indexes. Is that a factor?

Also of relevance is that this query originally used the INFORMATION_SCHEMA views, but these were at least twice as slow as the current performance we are getting using sys (though possibly that was because the sub-selects were in the field list rather than in the join).

Note that the timings come from the properties window in the management studio, and are consistent with what I get if I microtime() the query execution from within PHP.

UPDATE #1

I have constructed a query using our user data tables, which is basically the same structure as the one above (or as close as I could get).

This query runs at about 0.14s on first run, and then between 0.015 and 0.07 on subsequent runs. This is the kind of performance I would be expecting for the sys queries. Therefore, this appears to be an issue specific to the sys tables, rather than a general server configuration issue.

I can post the query here if it would be helpful, but will hold-off for now in case it's just spammy.

UPDATE #2

As requested, here is the stats output with SET STATISTICS TIME|IO ON for the original query, from a cold cache.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(56 row(s) affected)
Table 'sysiscols'. Scan count 112, logical reads 224, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysidxstats'. Scan count 112, logical reads 224, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 0, logical reads 448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syssingleobjrefs'. Scan count 0, logical reads 112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysscalartypes'. Scan count 0, logical reads 112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 210 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
HappyDog
  • 1,230
  • 1
  • 18
  • 45
  • can you share execution plan as xml – TheGameiswar Sep 25 '17 at 11:41
  • If you need metadata queries to be fast, you're probably doing something wrong. This is the sort of thing you run once or every so often, not on the fly every time because your code doesn't know what your table structure is like. Note that getting the metadata of a result set (not a table in general) is an operation available to most client frameworks in some fashion, as type information is included. In T-SQL, this is provided by [`sp_describe_first_result_set`](https://learn.microsoft.com/sql/relational-databases/system-stored-procedures/sp-describe-first-result-set-transact-sql). – Jeroen Mostert Sep 25 '17 at 11:52
  • I've posted it here: https://pastebin.com/m7TFUZiW Note that the paste will expire in 1 month. – HappyDog Sep 25 '17 at 11:53
  • Also, think in batches where possible: getting metadata results for all tables at once is more efficient than querying them for every individual table in a loop, for example. The *best* approach, of course, is to *know* the table structures in advance, or where not possible, to at least cache them, as they usually don't change with every query. (You'd have bigger problems if that were the case.) – Jeroen Mostert Sep 25 '17 at 11:54
  • 1
    @JeroenMostert "If you need metadata queries to be fast, you're probably doing something wrong." Coming from a MySQL background, that statement surprises me. I would expect meta-data queries to be quick! A lot of PHP frameworks use the table definition to define the model automatically, so it is expected that this will be a relatively low-cost operation. Whilst it would be possible to cache the meta-data in a temp file, this intuitively sounds like it would be slower, but perhaps this is the only option for SQL server? – HappyDog Sep 25 '17 at 12:00
  • PS - The performance of the query seems to increase with the number of columns in the table, so I'm not sure that running all tables at once will improve things - at least not in the cases where only 1 table definition is required. I have not tested this yet, though. – HappyDog Sep 25 '17 at 12:02
  • Retrieving metadata for a *result set* is quick. Retrieving metadata for a table is a fairly complicated operation that requires gathering data from multiple tables (as you've seen) *and* something that blocks DDL operations, so it's not advisable to do that repeatedly. I haven't timed it, but just for fun, try `sp_describe_first_result_set N'select * from [table]'` and using the output of that. (Note: this does not actually select any rows from the table, so the size of the table is irrelevant.) Better than that, even in PHP, you can (and should) generate the code for the model in advance. – Jeroen Mostert Sep 25 '17 at 12:08
  • Performance of sp_describe_first_result_set is broadly the same as what I get with the direct query, and has a few fields missing from the result. – HappyDog Sep 25 '17 at 12:15
  • Can you update your question with the result of set statistics io, time on for this query please? – sepupic Sep 25 '17 at 13:09
  • Done........... (padded answer to make SO post limiter happy) – HappyDog Sep 25 '17 at 13:30

2 Answers2

4

System views do have indexes (the underlying system catalog tables have indexes, actually). First and foremost, start by identifying the slowness cause. Read How to analyse SQL Server performance. I would somehow doubt this to be root caused to an indexing issue (ie. driven by size-of-data in metadata views). It is much more probable you are experience blocking, for various reasons. Scanning system views is still subject to locks, and DDL operations (create/alter/drop) will block scans until the DDL commits.

Also, just apply some common sense query optimizations. You are filtering by c.object_id = object_id('tblTestTable'); the result, but perhaps the inner queries (SELECT DISTINCT ...FROM sys.indexes ix JOIN sys.index_columns ic) cannot push down this predicate. Try forcing it, ie. add the WHERE object_id = object_id('tblTestTable') clause to the inner queries as well.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • The query was being run on a local test instance which had no other activity. Even on the live site, where there will be other activity, there will be no DDL operations being applied (these happen out-of-hours, when necessary). Therefore blocking is probably not part of the problem. – HappyDog Sep 25 '17 at 12:08
  • @HappyDog the main point still stands: use a methodical approach to measure and identify the bottleneck – Remus Rusanu Sep 25 '17 at 12:09
  • I'll try. I've just tried adding the WHERE clause to the inner queries, and this doesn't seem to make a difference, but I will continue along the route you suggest. – HappyDog Sep 25 '17 at 12:11
  • OK - I've reduced the query to the following, which is still *really* slow, by my standards (i.e. 0.11 to 0.17 seconds on a warm cache, instead of 0.0x which I would expect): `SELECT c.name, c.is_nullable, c.is_identity, c.precision, c.scale, c.max_length, c.collation_name FROM sys.columns AS c`. Removing further columns speeds it up a bit, but really this query should not be that slow, surely! – HappyDog Sep 26 '17 at 11:00
  • @HappyDog You should also run `sp_helptext 'sys.columns'` to see the definition of the system views. Some are quite complex, and each row returned must pass the `has_access` check (row level security). – Remus Rusanu Sep 26 '17 at 12:28
  • Are you saying that these are complex views and that therefore I should therefore expect them to be slow? – HappyDog Sep 26 '17 at 14:17
  • I'm saying that you will get slower response than compared to a table of similar structure as the view. The underlying tables are optimized for metadata maintenance operations (find by Id and by name) and to prevent DDL excessive locking or even deadlocks. Performance of querying of the views on top of the system tables has to be 'good enough', not the main goal. – Remus Rusanu Sep 26 '17 at 14:23
  • If you update your answer (or add a new answer) with that information - i.e. that the query can't be meaningfully optimised (including the explanation why) then I'll accept it. I've reached the conclusion that I can't get any meaningful performance improvement and will need to re-architect to avoid this type of query. – HappyDog Sep 26 '17 at 15:14
0

Based on various comments in this thread and in my own research, I have concluded the following:

Q1. Should I expect a query like this to be slow?

Answer: Apparently, yes.

I reduced the query to the following, which is still really slow, by my standards (i.e. 0.11 to 0.17 seconds on a warm cache, instead of 0.0x seconds, which I would expect):

SELECT
    c.name, c.is_nullable, c.is_identity,
    c.precision, c.scale, c.max_length, c.collation_name
FROM
    sys.columns AS c;

If the straight query, with no joins or calculated fields, is this slow then I can only conclude that it is a limitation of SQL Server rather than something strange that I am doing.


Q2. If yes, is there an alternative, faster method of getting this information? If no, what should we do to optimise it?

Answer: It appears that there is no faster method, nor are there any meaningful optimisations to be made (assuming that all columns are required).

Here is a quote from Remus Rusanu in a comment to one of the other answers:

Are you saying that these are complex views and that therefore I should therefore expect them to be slow?

I'm saying that you will get slower response than compared to a table of similar structure as the view. The underlying tables are optimized for metadata maintenance operations (find by Id and by name) and to prevent DDL excessive locking or even deadlocks. Performance of querying of the views on top of the system tables has to be 'good enough', not the main goal.

Based on this, as well as some experiments of my own, it seems unlikely that there are any further optimisations that could be made to extract this information in a faster way, either through alternative query formulation or via different lookup mechanisms.


The solution, in my case

So, given the above, the only solution for our use-case is to cache the table schema locally, via the caching mechanisms provided by our framework. This requires that we run the re-cache script every time we update the DB schema, and runs the risk of the application blowing-up massively if schema changes are made without this script being run. However, it has removed this performance bottle-neck from our application, simply by removing the need to run the query during normal use.

Community
  • 1
  • 1
HappyDog
  • 1,230
  • 1
  • 18
  • 45