-2

Possible Duplicate:
select * vs select column

I was just having a discussion with one of my colleague on the SQL Server performance on specifying the query command in the stored procedure.

So I want to know which one is preferred over another and whats the concrete reason behind that.

Suppose, We do have one table called Employees(EmpName,EmpAddress)

And we want to select all the records from the table. So we can write the query in two ways,

  1. Select * from Employees

  2. Select EmpName, EmpAddress from Employees

So I would like to know is there any specific difference or performance issue in the above queries or are they just equal to the SQL Server Engine.

UPDATE:

  1. Lets say the table schema won't change anymore. So no point for future maintenance.
  2. Performance wise, lets say, the usage is very very high i.e. millions of hits per seconds on the database server. I want a clear and precise performance rating on both approaches.
  3. No Indexing is done on the entire table.
Community
  • 1
  • 1
Manish Rawat
  • 1,142
  • 1
  • 18
  • 34
  • 1
    They are equal until you `alter` the `table`. – ypercubeᵀᴹ Jan 21 '13 at 18:00
  • Lets say they won't get altered in the entire application life time, I am just looking for the performance even upto nano-seconds. – Manish Rawat Jan 21 '13 at 18:18
  • 2
    You have no indexes on the table and you are worried about the performance of select * vs. select col1,col2? I think you need to consider your priorities. – Aaron Bertrand Jan 21 '13 at 18:29
  • 4
    Guys, they are equal only if there are no other columns *and* there is no narrower, covering index. – Aaron Bertrand Jan 21 '13 at 18:30
  • @AaronBertrand All I want to say that there will be no more columns to get added in the table anymore and there is no index added, as I have heard people saying that if you choose specific columns which are part of the indexes then the query will execute faster, so to avoid those indexes perforamnce answers i specify the third point. – Manish Rawat Jan 21 '13 at 18:34
  • Then they are equal, but let me ask you this: if the table is not going to change, what do you gain by using * instead of typing out the column names? You saved half a second not having to type the column names? Don't choose * in the one case where you can be lazy since you shouldn't select that option in every other case you will come across. – Aaron Bertrand Jan 21 '13 at 18:40
  • 1
    Let me put that another way: SELECT * will never be *MORE* efficient than naming the columns. Do yourself a favor and just resign yourself to not be lazy even when the performance is the same. – Aaron Bertrand Jan 21 '13 at 18:41
  • @AaronBertrand does sql server perform really faster if you are a non-lazy programmer. And by the way, lazy people invent bicycle as they don't want to walk hundreds of meters daily. Programmers made their lives better by writing lesser code, so why opt for Select col1,col2 which could take several more seconds huh... :P – Manish Rawat Jan 21 '13 at 18:47
  • 3
    @Manish because you only type the code ONCE. When you do that, you should be verifying it for accuracy, building a proper interface with the consuming application (you can't just use * in C#, for example, when displaying the data), etc. And you should not be relying on similar performance from this one scenario since it won't be true in almost every other scenario on earth (how many tables NEVER change? how many tables don't have indexes?). Again, you are focusing completely and wholly on the wrong aspect of efficiency. – Aaron Bertrand Jan 21 '13 at 18:53

4 Answers4

8

The specific difference would show its ugly head if you add a column to the table.

Suddenly, the query you expected to return two columns now returns three. If you coded specifically for the two columns, the rest of your code is now broken.

Performance-wise, there shouldn't be a difference.

I always take the approach that being as specific as possible is the best when dealing with databases. If the table has two columns and you only need those two columns, be specific. Specify those two columns. It'll save you headaches in the future.

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
  • consider DataRow[1] is in your code and someone adds a column in the DB scheme before that column you meant... – eFloh Jan 21 '13 at 18:03
7

You should always reference columns explicitly. This way, if the table structure changes (and such changes are made in an intelligent, backward-compatible way), your queries will continue to work and can be modified over time.

Also, unless you actually need all of the columns from the table (not typical), using SELECT * is bringing more data to your application than is necessary, and potentially forcing a clustered index scan instead of what might have been satisfied by a narrower covering index.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Ohhk, I understand the indexing and backward compantibility issues with the select *, but if however if we are all makesure that there will be no changes in the entire schema of the table then also you will prefer to speicify columns explicitly. – Manish Rawat Jan 21 '13 at 18:27
  • @ManishRawat are these the only two columns in the table? If your concern is performance, why are you dead-set against indexes? (Hint: they will have a much greater impact on most of your queries than the result of the choice you're making about select *). – Aaron Bertrand Jan 21 '13 at 18:34
  • I removed the index from the context to make sure that nobody say that indexes are much more performance oriented and if you specify the indexed columns then its a perforamnce boost again. – Manish Rawat Jan 21 '13 at 18:42
7

I am an avid avokat of the "be as specific as possible" rule, too. Not following it will hurt you in the long run. However, your question seems to be coming from a different background, so let me attempt to answer it.


When you submit a query to SQL Server it goes through several stages:

  1. transmitting of query string over the network.
  2. parsing of query string, producing a parse-tree
  3. linking the referenced objects in the parse tree to existing objects
  4. optimizing based on statistics and row count/size estimates
  5. executing
  6. transmitting of result data over the network

Let's look at each one:

  1. The * query is a few bytes shorter, so step this will be faster
  2. The * query contains fewer "tokens" so this should(!) be faster
  3. During linking the list of columns need to be puled and compared to the query string. Here the "*" gets resolved to the actual column reference. Without access to the code it is impossible to say which version takes less cycles, however the amount of data accessed is about the same so this should be similar.
  4. -6. In these stages there is no difference between the two example queries, as they will both get compiled to the same execution plan.

Taking all this into account, you will probably save a few nanoseconds when using the * notation. However, you example is very simplistic. In a more complex example it is possible that specifying as subset of columns of a table in a multi table join will lead to a different plan than using a *. If that happens we can be pretty certain that the explicit query will be faster.

The above comparison also assumes that the SQL Server process is running alone on a single processor and no other queries are submitted at the same time. If the process has to yield during the compilation those extra cycles will be far more than the ones we are trying to save.

So, the amont of saving we are talking about is very minute compared to the actual execution time and should not be used as an excuse for a "bad" coding practice.

I hope this answers your question.

Sebastian Meine
  • 11,260
  • 29
  • 41
1

Performance wise there are no difference between those 2 i think.But those 2 are used in different cases what may be the difference.
Consider a slightly larger table.If your table(Employees) contains 10 columns,then the 1st query will retain all of the information of the table.But for 2nd query,you may specify which columns information you need.So when you need all of the information of employees no.1 is the best one rather than specifying all of the column names.
Ofcourse,when you need to ALTER a table then those 2 would not be equal.

ridoy
  • 6,274
  • 2
  • 29
  • 60
  • But @ridoy I have read somewhere that Select * is converted into Select col1,col2 at the runtime and that also make a performance hit on using the Select *. Is that correct ? – Manish Rawat Jan 21 '13 at 18:29
  • That performance impact is so negligible that it isn't worth the energy required to talk about it. The other effects mentioned here are much, much, much more important. – Aaron Bertrand Jan 21 '13 at 18:35
  • I also agree with Aaron. – ridoy Jan 21 '13 at 18:38
  • @AaronBertrand , so even single penny from each person on this earth could make you millionaire. So every single nano-second you save on your query will give you a great performance boost. So as per my understanding if i add/remove some valid points to the answer then i will get to know exactly on which part the two queries differ. For updates: please refer to the question as I have made some changes in the question. – Manish Rawat Jan 21 '13 at 18:39
  • @Manish my points still stand: SELECT * will never be MORE efficient from a performance perspective (maybe only a few seconds saved on typing). So why use it, when even to only get *identical* performance as SELECT col1, col2 you can only do that in those very specific cases - where those are *ALL* the columns in the table (or there is no covering index), and when you really do need *ALL* the columns in the table. Are you trying to find evidence that SELECT * is more efficient? Why, and how long are you going to spend trying to find it? (Hint: you won't.) – Aaron Bertrand Jan 21 '13 at 18:44
  • Can you let me know any benchmarking tool in which i can benchmark the perforamnce. – Manish Rawat Jan 21 '13 at 18:51
  • and by the way, is this your assumption or you are very much sure about the perforamance in the both queries? – Manish Rawat Jan 21 '13 at 18:53
  • @Manish let's see, I've been working with SQL Server since version 6.5. I think I am pretty sure. There are tons of tools for benchmarking SQL Server performance. Did you google [benchmark SQL Server performance](https://www.google.com/search?q=benchmark+sql+server+performance&oq=benchmark+sql+server+performance&aqs=chrome.0.57j60j57j60l3.4355&sourceid=chrome&ie=UTF-8)? – Aaron Bertrand Jan 21 '13 at 18:55
  • Ok, so which tool you suggest me to use for bench-marking these kind of funny things in MS Sql Server. As there are tons of tools in google, but i don't know which one to use, so better you suggest me your favorite one. And its not the evidence which i am looking for, I want to know the evidence of why Select * is lacking in the performance as answered by most of the geeks. – Manish Rawat Jan 21 '13 at 19:05
  • That seems like a different question, and it wouldn't be on topic here. It will quickly be closed as a shopping list question because tools come and go, and there can't be a "correct" answer since different people have different requirements for a benchmarking tool. I can recommend [SQL Sentry Plan Explorer](http://sqlsentry.net/plan-explorer/sql-server-query-view.asp) as a very easy way to compare two queries. *Disclaimer: I work for SQL Sentry.* – Aaron Bertrand Jan 21 '13 at 20:41