17

I've always taken the approach of first deploying the database with a minimal set of indexes and then adding/changing indexes as performance dictates.

This approach works reasonably well. However, it still doesn't tell me where I could improve performance. It only tells me where performance is so bad that users complain about it.

Currently, I'm in the process of refactoring database objects on a lot of our applications.

So should I not bother to look for performance improvements since "premature optimization is the root of all evil"?

When refactoring application code, the developer is constantly looking for ways to improve the code quality. Is there a way to constantly be looking for improvements in database performance as well? If so, what tools and techniques have you found to be most helpful?

I've briefly played around with the "Database engine tuning advisor" but didn't find it to be helpful at all. Maybe I just need more experience interpreting the results.

Ray
  • 187,153
  • 97
  • 222
  • 204
Chad Braun-Duin
  • 2,188
  • 2
  • 19
  • 26

14 Answers14

11

My approach is to gather commands against the server or database into a table using SQL Server Profiler. Once you have that, you can query based on the max and avg execution times, max and avg cpu times, and (also very important) the number of times that the query was run.

Since I try to put all database access code in stored procedures it's easy for me to break out queries. If you use inline SQL it might be harder, since a change to a value in the query would make it look like a different query. You can try to work around this using the LIKE operator to put the same types of queries into the same buckets for calculating the aggregates (max, avg, count).

Once you have a "top 10" list of potential problems you can start looking at them individually to see if either the query can be reworked, an index might help, or making a minor architecture change is in order. To come up with the top 10, try looking at the data in different ways: avg * count for total cost during the period, max for worst offender, just plain avg, etc.

Finally, be sure to monitor over different time periods if necessary. The database usage might be different in the morning when everyone is getting in and running their daily reports than it is at midday when users are entering new data. You may also decide that even though some nightly process takes longer than any other query it doesn't matter since it's run during off hours.

Good luck!

Tom H
  • 46,766
  • 14
  • 87
  • 128
11

"premature optimization is the root of all evil"

In terms of database programming, I think this quote is nonsense. It is extremely expensive to re-write your whole application because your developers don't care to write efficient code the first time. All t-sql code should be thought of in terms of how it will affect database performance second (data integrity is, of course, first). Perfomance should trump everything except data integrity.

Yes, there are optimization things you shouldn't do until you have issues, but some things should be done as a matter of course and not fixed later. It takes no more time to write code that has a better chance of being efficient than code which will not be once you understand how you are affecting efficiency with the bad code. Cervo's discussion of cursor code is one example. Set-based actions are almost always much faster than cursor solutions, so cursors should not ever be written initially when a set-based solution will do. It almost always takes me less time to write a set-based solution that it would to write a cursor, but the only way to get that way is to never write cursors.

And there is no reason to ever use select * instead of specifying your field names. In MSSQL you can drag those names over from the object explorer so you can't tell me it's too hard to do that. But by specyfying only the fields you actually need, you save network resources and database server resources and web server resources. So why should a programmer ever take the lazy option of select * and worry about optimizing later?

The same thing with indexes. You say you do a minimal set of indexes. Depending on how you define minimal, that could be ok, but it is critical to have indexes on all foreign keys and I wouldn't want to push a database that didn't have indexes on a few fields that are most often in the where clauses. If your users are outside clients and not internal, they won't complain about how slow your site is, they will go elsewhere. It only makes busness sense to plan for efficient database access from the start.

One of my main concerns about failing to consider efficiency from the beginning is that the first couple of times that things are too slow companies tend to just throw more equipment at the issue rather than performance tune. By the time people start performacne tuning you have a several gigabyte or more database with many unhappy customers who are getting timeouts more than results. At this point, often almost everything in the database has to be re-written and in the meantime you are losing customers. I remember providing support at one company with a commercial application that it literally took ten minutes for the customer service reps to move from one screen to another while they were trying to help already disgruntled customers on the phone. You can imagine how many customers the company lost due to poorly designed database queries in the commercial product that we could not change.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • I agree with this - Database systems involve bits of metal going around and back and forth. These bits of metal are not getting exponentially faster in the way that IC's are and getting the query wrong can make orders of magnitude difference to the performance. – ConcernedOfTunbridgeWells Sep 22 '08 at 21:16
  • Indeed, I replaced cursors with set-based code that took the performance form minutes to millseconds and sometimes from hours to seconds. No need to write this kind of code to begin with. – HLGEM Sep 22 '08 at 21:38
  • 1
    Premature optimization is about avoiding optimizing something when you do not have measurements on how much time/effort will be saved. Avoiding using cursors is not premature optimization as there is a very compelling and well documented improvement in performance to avoiding these. – Nat Jul 13 '11 at 23:09
  • To go along with @nat's point, there is compelling documentation of the performance differnce between many differnt query methods that give the same results. Using saragble where clauses, using where exists insted of IN or avoiding correlated subqueries are three things that spring to mond beyond just cursors. But there are whole books on performance tuning that teach you what are bad techniques for writing queries and what are good ones. No one should write sql code (or use an ORM) until they have read one of these books from cover to cover. – HLGEM Feb 18 '14 at 20:01
  • To continue: Using known poorly performing techniques when the known better performing techniques take the same amount of time or less to code is just poor coding. It is usually because people are unaware that how the query is built makes a difference. – HLGEM Feb 18 '14 at 20:04
5

SQL Server Execution Plan!!! Go here: http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/

MarlonRibunal
  • 4,009
  • 3
  • 31
  • 37
2

After you profile, put the queries you see as troublesome into SQL Query Analyzer and display the execution plan. Identify portions of the queries that are performing costly table scans and re-index those tables to minimize this cost.

Try these references:

Optimizing SQL
How to Optimize Queries

Arthur Miller
  • 310
  • 1
  • 10
1

profile your queries, not the obvious ones, but the complex that access different tables, views, etc and/or the ones that return many rows from different tables

That will tell you exactly where you should focus

BlackTigerX
  • 6,006
  • 7
  • 38
  • 48
1

profiling is key, but when using a profiling set you MUST be sure that it is an accurate test set of data, otherwise the tuning tools will not be able to get you an accurate result what is needed.

Also the management objects with fragmentation an usage reporting in 2005 are very helpful!

Mitchel Sellers
  • 62,228
  • 14
  • 110
  • 173
1

Of course you have to profile your queries and look at the execution plan. But the two main things that come up over and over again are filter out as much as you can as soon as you can and try to avoid cursors.

I saw an application where someone downloaded an entire database table of events to a client and then went through each row one by one filtering based on some criteria. There was a HUGE performance increase in passing the filter criteria to the database and having the query apply the criteria in a where clause. This is obvious to people who work with databases, but I have seen similar things crop up. Also some people have queries that store a bunch of temp tables full of rows that they don't need which are then eliminated in a final join of the temp tables. Basically if you eliminate from the queries that populate the temp tables then there is less data for the rest of the query and the whole query runs faster.

Cursors are obvious. If you have a million rows and go row by row then it will take forever. Doing some tests, if you connect to a database even with a "slow" dynamic language like Perl and perform some row by row operation on a dataset, the speed will still be much greater than a cursor in the database. Do it with something like Java/C/C++ and the speed difference is even bigger. If you can find/eliminate a cursor in the database code, it will run much faster... If you must use a cursor, rewriting that part in any programming language and getting it out of the database will probably yield huge performance increases.

One more note on cursors, beware code like SELECT @col1 = col1, @col2 = col2, @col3 = col3 where id = @currentid in a loop that goes through IDs and then executes statements on each column. Basically this is a cursor as well. Not only that but using real cursors is often faster than this, especially static and forward_only. If you can change the operation to be set based it will be much faster.....That being said, cursors have a place for some things....but from a performance perspective there is a penalty to using them over set based approaches.

Also beware the execution plan. Sometimes it estimates operations that take seconds to be very expensive and operations that take minutes to be very cheap. When viewing an execution plan make sure to check everything by maybe inserting some SELECT 'At this area', GETDATE() into your code.

Cervo
  • 3,112
  • 1
  • 24
  • 27
1

My advice is that "premature optimization is the root of all evil" in this context is absoulte nonsense.

In my view its all about design - you need to think about concurrency, hotspots, indexing, scaling and usage patterns when you are designing your data schema.

If you don't know what indexes you need and how they need to be configured right off the bat without doing profiling you have already failed.

There are millions of ways to optimize query execution that are all well and good but at the end of the day the data lands where you tell it to.

1

Apply proper indexing in the table columns in the database

  • Make sure that every table in your database has a primary key.

This will ensure that every table has a clustered index created (and hence, the corresponding pages of the table are physically sorted in the disk according to the primary key field). So, any data retrieval operation from the table using the primary key, or any sorting operation on the primary key field or any range of primary key values specified in the where clause will retrieve data from the table very fast.

  • Create non-clustered indexes on columns which are

    Frequently used in the search criteria.

    Used to join other tables.

    Used as foreign key fields.

    Of having high selectivity (column which returns a low percentage (0-5%) of rows from a total number of rows on a particular value).

    Used in the ORDER BY clause.

Don't use "SELECT*" in a SQL query

Unnecessary columns may get fetched that will add expense to the data retrieval time. The database engine cannot utilize the benefit of "Covered Index" and hence the query performs slowly.

Example:

SELECT Cash, Age, Amount FROM Investments; 

Instead of:

SELECT * FROM Investments;

Try to avoid HAVING Clause in Select statements

HAVING clause is used to filter the rows after all the rows are selected and is used like a filter. Try not to use HAVING clause for any other purposes.

Example:

SELECT Name, count (Name) FROM Investments WHERE Name!= ‘Test’ AND Name!= ‘Value’ GROUP BY Name;

Instead of:

SELECT Name, count (Name) FROM Investments GROUP BY Name HAVING Name!= ‘Test’ AND Name!= ‘Value’ ;

Try to minimize number of sub query blocks within a query

Sometimes we may have more than one sub query in our main query. We should try to minimize the number of sub query block in our query.

Example:

SELECT Amount FROM Investments WHERE (Cash, Fixed) = (SELECT MAX (Cash), MAX (Fixed) FROM Retirements) AND Goal = 1; 

Instead of:

SELECT Amount FROM Investments WHERE Cash = (SELECT MAX (Cash) FROM Retirements) AND Fixed = (SELECT MAX (Fixed) FROM Retirements) AND Goal = 1;

Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions

Selecting unnecessary columns in a Select query adds overhead to the actual query, especially if the unnecessary columns are of LOB types. Including unnecessary tables in join conditions forces the database engine to retrieve and fetch unnecessary data and increases the query execution time.

Do not use the COUNT() aggregate in a subquery to do an existence check

When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index. When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking.

Try to avoid joining between two types of columns

When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted. If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts.

Try not to use COUNT(*) to obtain the record count in a table

To get the total row count in a table, we usually use the following Select statement:

SELECT COUNT(*) FROM [dbo].[PercentageForGoal]

This query will perform a full table scan to get the row count. The following query would not require a full table scan. (Please note that this might not give you 100% perfect results always, but this is handy only if you don't need a perfect count.)

SELECT rows FROM sysindexes
WHERE id = OBJECT_ID('[dbo].[PercentageForGoal]') AND indid< 2

Try to use operators like EXISTS, IN and JOINS appropriately in your query

  • Usually IN has the slowest performance.
  • IN is efficient, only when most of the filter criteria for selection are placed in the sub-query of a SQL statement.
  • EXISTS is efficient when most of the filter criteria for selection is in the main query of a SQL statement.

Try to avoid dynamic SQL

Unless really required, try to avoid the use of dynamic SQL because: Dynamic SQL is hard to debug and troubleshoot. If the user provides the input to the dynamic SQL, then there is a possibility of SQL injection attacks.

Try to avoid the use of temporary tables

Unless really required, try to avoid the use of temporary tables. Rather use table variables. In 99% of cases, table variables reside in memory, hence it is a lot faster. Temporary tables reside in the TempDb database. So operating on temporary tables require inter database communication and hence will be slower.

Instead of LIKE search, use full text search for searching textual data

Full text searches always outperform LIKE searches. Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word. Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).

Try to use UNION to implement an "OR" operation

Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance. Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.

Implement a lazy loading strategy for large objects

Store Large Object columns (like VARCHAR(MAX), Image, Text etc.) in a different table than the main table, and put a reference to the large object in the main table. Retrieve all the main table data in a query, and if a large object is required to be loaded, retrieve the large object data from the large object table only when it is required.

Implement the following good practices in User Defined Functions

Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed; instead, call the LEN function once, and store the result in a variable for later use.

Implement the following good practices in Triggers

  • Try to avoid the use of triggers. Firing a trigger and executing the triggering event is an expensive process.
  • Never use triggers that can be implemented using constraints.
  • Do not use the same trigger for different triggering events (Insert, Update and Delete).
  • Do not use transactional code inside a trigger. The trigger always runs within the transactional scope of the code that fires the trigger.
Dhyan Mohandas
  • 1,126
  • 11
  • 12
0

It seems that you're talking about MS SQL.

Start the profiler and record tehe most common queries you run on the database. Then run those queries with the Execution Plan turned on and you will see what (if anything) is slowing your queries down. You could then go and optimize the queries or add more indexes on your fields.

SQL Books will give you a good overview of both profiling and query analysis functionality.

Ilya Kochetov
  • 17,988
  • 6
  • 44
  • 60
0

You might want to check internal and external framentation of current indexes and either drop and re-create them or re organize them.

Charles Graham
  • 24,293
  • 14
  • 43
  • 56
0

Make sure you are profiling using production volumes - in terms of number of rows and load. The queries and their plans behave differently under different load/volume scenarios

Simon Munro
  • 5,399
  • 6
  • 33
  • 40
0

Generally, the tips here:

http://www.sql-server-performance.com/

have been high quality and useful for me in the past.

Sean
  • 7,562
  • 10
  • 27
  • 29
-1

My advice would be to start with techniques applicable to all databases and then try the ones specific to MsSQL.

Optimizing SQL is difficult, and there are no hard and fast rules. There are very few generic guidelines that you can follow, such as:

  • 95% of performance improvements will come from the application, not from server or database engine configuration.
  • Design for correctness first, tweak for performance later
  • Reduce trips to the database
  • Try to express things in a way that fits your data model
  • Ignore generic advice about performance - yes, at some point you'll find a system or SQL statement where one of those rules does not apply.

But the key point is that you should always apply the 80-20 rule. Which means that in any system you need to tweak 20% (often much less) of your code for the biggest performance gains. That's where the vendor provided tools usually fail, as they cannot usually guess the application/business context of execution.