7

I have an assignment for a business which is basically just about extracting data from a database (Microsoft SQL Server 2008). In the process, the users will be able to choose which columns to select, choose which view to select from, and build the WHERE clause. Based on what the user chooses, the SQL query is constructed accordingly. A requirement is that the user may select ANY column from ANY view and filter by ANY column in the WHERE clause. The company does not want the solution to use data warehouse/OLAP, and wants to limit any third party software. So basically they just want a .NET Windows Forms application which dynamically constructs SQL queries based on a GUI and connects to a database.

My concern here is how to optimize the queries. I am by no means good at optimizing SQL queries yet, but my first thought was: what if the user chooses to filter by a column which has no index (in the WHERE clause)? By giving the user so much flexibility, they can potentially construct queries that are so ineffective that they will take a long time to execute.

I realize that the performance can never be good with a lot of data if they filter on columns that have no indices, but is there anything I can do to improve it? For sure I cannot just add indices to all columns.

I am not necessarily just looking for query optimization, but I am also thinking if there are any server tweaks that I can do, such as caching? Basically I am all ears and looking for any advice that can help me improve the performance.

Any suggestions?

Thank you in advance!

ba0708
  • 10,180
  • 13
  • 67
  • 99
  • 1
    This is one of the scenarios for which column store indexes are supposed to help (not available until next version!). Not sure what you mean by not using a DW. Is this querying an OLTP database that is also subject to concurrent data modifications hence the reluctance to add indexes? – Martin Smith Jan 12 '12 at 13:46
  • Yes, the database will more often than not be an OLTP database that is used for many other things. Therefore, adding many indices would slow down inserts/updates much, I believe. – ba0708 Jan 12 '12 at 13:56
  • 1
    Doubt there's much you can do then based upon that specification (apart from obvious hardware upgrades). Performance will probably suck. Why have they discounted OLAP/DW for this? – Martin Smith Jan 12 '12 at 13:59
  • 1
    @MartinSmith is right - given these constraints you really can't optimize too much. You can have **EITHER** flexibility or good performance. Swiss army knives can do a lot, but they aren't the best at anything that they do. – JNK Jan 12 '12 at 14:01
  • Yes, this was what I was afraid of. The unfortunate thing is that the efficiency greatly depends on the number of rows. Customers with millions of rows will be left unsatisfied, but the company is aware of this. There are several reasons why they don't want DW/OLAP; this system is like a complimentary product that not all customers use, and DW would probably be too costly for many customers. Also, many customers would probably not be interested in this. Then there is also the cost (even though there are open source projects out there) . Mostly it's due to the complexity for the customers. – ba0708 Jan 12 '12 at 14:20
  • Well until self tuning databases come along there isn't really a solution that I can think of. Apart from anticipating key queries in advance or providing some self service index creation facility based on missing index info in the plan - which could cause mopre problems than it solves! – Martin Smith Jan 12 '12 at 14:26
  • That was my conclusion as well, but as I stated, I am not so experienced with performance tuning. Thank you for your help! – ba0708 Jan 12 '12 at 15:30
  • everything you need is in these: [Dynamic Search Conditions in T-SQL](http://www.sommarskog.se/dyn-search.html) and [The Curse and Blessings of Dynamic SQL](http://www.sommarskog.se/dynamic_sql.html) – KM. Jan 12 '12 at 18:09
  • @KM. Could you please elaborate what you suggest exactly? I have spent some time looking through those pages and I didn't see the light. Perhaps that's just because many of those things are new to me. – ba0708 Jan 14 '12 at 12:22
  • your question is a little too "large" to just answer. I've pointed you to two articles which are the main point of your problem: building dynamic SQL and building dynamic search conditions. My suggestions would be to ask several questions of more narrow scope and go from there. – KM. Jan 16 '12 at 12:39
  • Okay, this is a non-RDBMS-specific suggestion and might come out as pretty stupid, but depending on what you want to query for what about creating a dynamic index based on the columns the query involves on the run before running the actual query? Sometimes the benefit might exceed the cost, specially if the query involves a lot of post-processing (e.g. `sort by`, `group by`). You probably can't drop the index right away after the query is run but you could schedule the application/database to drop it after a while. – Piovezan Dec 16 '20 at 18:51

2 Answers2

4

You really cannot do much except forseeing what users are likely going to do. You are in the good position to have the SQL Server optimizer do the hard work for you (imagine building this on a key-value store!).

I would create indexes on the most likely columns that will be filtered or sorted on. You should try filtering those indexes to non-null values which will reduce storage cost (assuming users will not filter for null values).

You can also try to precompute common joins and aggregations using indexed views. If you are willing to throw insane amounts of RAM at this problem and are willing to have slow writes you can index and materialize the hell out of this database.

Finally, you can offload users queries on a read-only log-shipping target or the like. This will sandbox their horrible queries.

For your queries, you need to dparameterize them, but you do not need to cache them in all cases. If your queries tend to have a big cost (so compilation times are inconsequential) you will want to run them WITH OPTION RECOMPILE so SQL Server can adapt to the exact runtime values of all parameters.

You should also monitor all queries and review them to look for patterns. Your users are likely to run very similar queries all the time. Index for them.

Run sp_updatestats regularly.

Finally, I want to say that there is no very effective solution to this because if there were SQL Server would implement them itself so everyone could benefit.

usr
  • 168,620
  • 35
  • 240
  • 369
1

First, to improve SQL Server's ability to optimize, cache and compile queries/statements

  • Make sure the UI supports IN and BETWEEN while letting users build their own WHERE clause.
  • Sort your AND or OR conditions so that indexed columns are first, followed by alphabetical order of the other columns.
    • If you're allowing nested ANDs and ORs in your WHERE clause this may be more difficult
  • Use *parameterized queries"
WHERE C1 = 'foo' AND C3 = 'bar' AND C2 = 42
-- if C3 is an indexed column then 
WHERE C3 = @parm1 AND C1 = @parm2 AND C2 = @parm3

Second, to empower users

  • When listing the columns the user can choose from, list the indexed columns first or make them recommended columns to use.
  • Build in some recording of the columns that users choose and the time their query takes to complete. Having this information can help you tune the database in the future and improve the user experience.

EDIT OR -> AND or OR with regards to Martin Smith's comment, this is called Short Circuiting.

Consider the logic

A = True OR B = True OR C = True

If A is indeed True there's no need to evaluate B or C for the condition to be true

A = True AND B = True AND C = True

In this case if A is False there is no need to evaulate B or C for the condition to be false.

Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • Sorting `Or` conditions so indexed columns are first will make no difference – Martin Smith Jan 12 '12 at 18:36
  • @Martin Smith - I was under the impression that SQL Server would short circuit Boolean expressions. Your comment is kind of glib, so if you could elaborate it'd be helpful. The other point to the ordering and parameterization is so queries using the same columns with different parameters will be cached for reuse. – Louis Ricci Jan 12 '12 at 19:14
  • 1
    @LastCoder - [SQL Server does not short circuit](http://stackoverflow.com/questions/381224/sql-server-query-short-circuiting) – Lieven Keersmaekers Jan 13 '12 at 09:41
  • Thank you. We are already using paramaterized queries in most cases, but is this possible when the columns and view to select from are also dynamic? Some colleagues of mine told me that they could not make it work. My logic tells me that's because the client is sending the parameterized query to the SQL server first, but the server is not able to make an execution plan without knowing which view/table to select from. Is this correct, and is there any work around for this? – ba0708 Jan 14 '12 at 12:21
  • @andy124 - You should consider using sp_executesql you may have to build in the view/table into your sql string and parameterize only the where clause variables. http://msdn.microsoft.com/en-us/library/ms175170.aspx – Louis Ricci Jan 15 '12 at 00:38
  • We are currently using parameterized SQL queries in ADO.NET. What would the advantages be for us to use stored procedures instead? According to MSDN, there is a performance advantage, mainly due to the reuse of execution plans. However, I thought this was also the case for parameterized queries in ADO.NET, but perhaps these are only reused for a limited time? I am only guessing here... It seems as if it is possible to use dynamic table/view names with stored procedures according to the insert dates example: http://msdn.microsoft.com/en-us/library/ms188001.aspx – ba0708 Jan 15 '12 at 11:25