5

I've heard that this is the wrong way to write queries in general but I wonder something people say to new users of SQL.

Is it really bad form or just plain lazy?

So, not to solicit too much opinion on this, what are some good reasons for it's use in queries besides not having to type out every field name. Also, do these reasons outweigh the harm in it's use?

Michael Hoffman
  • 32,526
  • 7
  • 64
  • 86
Matthew
  • 8,183
  • 10
  • 37
  • 65
  • It's a tool like any other. If you never use it, don't worry about it. – Marc B Feb 01 '12 at 17:34
  • It can end up biting you later if your schema changes and your code depends on the ordering of columns in a data set. Other than that there's nothing *technically* wrong with it. – Mike Christensen Feb 01 '12 at 17:35
  • @Marc B. I use it all the time and that's what may be worrying me. – Matthew Feb 01 '12 at 17:37
  • possible duplicate of [select * vs select column](http://stackoverflow.com/questions/3180375/select-vs-select-column) – Conrad Frix Feb 01 '12 at 18:09
  • There are alot of things wrong with using it. It is a SQL antipattern and if it is production code, I'd fire you. – HLGEM Feb 01 '12 at 18:14
  • Was constructive, but duplicate of http://stackoverflow.com/questions/65512/which-is-faster-best-select-or-select-column1-colum2-column3-etc. – Marcus Adams Feb 01 '12 at 18:35
  • I liked my title better. Going to make a meta question about it. – Matthew Feb 01 '12 at 18:43
  • @MarcusAdams It is ironic that this question did not get a duplicate answer then: it seems that everyone who answered that other question thought the question was "tell me why `select *` is bad", rather than "are there legitimate cases for `select *`". And in fact, there are. – Sergey Kalinichenko Feb 01 '12 at 18:45
  • @Matthew I would propose a different title: "are there legitimate cases for using `select *`?". I suspect that had the focus of your question been a search for legitimate cases, rather than comparing the two options, the question would remain open. – Sergey Kalinichenko Feb 01 '12 at 18:49
  • @dasblinkenlight - [Then it would be a duplicate of this](http://stackoverflow.com/q/3635781/73226) – Martin Smith Feb 01 '12 at 19:00
  • @dasblinkenlight that's what editing is for right? http://meta.stackexchange.com/questions/120974/which-title-is-better – Matthew Feb 01 '12 at 19:02
  • @Matthew right, that is what editing is for. However, I would not edit this question, because I also answered it. Me editing the question would make it look like I'm tweaking the question so that my answer looks more applicable (and that would be true to a certain extent, too). Either way, as Martin Smith pointed out, this would make your question a duplicate of another asked-and-answered question, so there is no point in editing it. – Sergey Kalinichenko Feb 01 '12 at 19:14

4 Answers4

11

When your program knows about the structure of your table at compile-time or through configuration, using select * is not a good idea: any change in the structure of your table could break the structure of the results coming back from the query, ultimately causing run-time errors.

However, there are cases when * is indispensable. Specifically, if your program "learns" dynamically of the structure of your tables by reading metadata coming back from a query, using the "all columns" request lets your program pick up changes to your tables dynamically.

It goes without saying that using * for ad-hoc queries in your favorite flavor of SQL Studio/SQLPlus/etc. is very common and convenient.

Adam Rackis
  • 82,527
  • 56
  • 270
  • 393
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • This is a good example of why I think it is ok for what I've been doing lately. I have generic Java classes that take query results and have getters for the fields. If I use *, these Objects pick up any new fields added to the table. – Matthew Feb 01 '12 at 17:43
  • @Matthew - But presumably you need to write code that consumes these extra columns or you end up bringing back a load of data that never gets used? So why wouldn't you just change the query at the same time as extending the code. – Martin Smith Feb 01 '12 at 17:45
  • @Matthew You are right, your example is the classic case for using `select *` in a program for attaining results that would otherwise require a lot more effort. – Sergey Kalinichenko Feb 01 '12 at 17:47
  • 1
    @MartinSmith Not necessarily: you could put newly added data into a hash map organized by name, or do something similar. In a program that uses very late binding this is very much possible (of course you pay with efficiency for this kind of dynamism in your program's behavior, but then performance is the common currency in which you pay for flexibility anyway). – Sergey Kalinichenko Feb 01 '12 at 17:51
  • select * will often kick you out to a table scan instead of an index scan. in other words, you may be taking a performance hit due to your laziness. – sam yi Feb 01 '12 at 17:56
  • @samyi Correct, the first point of my answer is that you should never do it out of laziness: sometimes the results of a query could be obtained entirely by scanning an index, so dragging in columns that are not in the index hurts the performance badly. However, dynamic programs that have a requirement to pick up and process the new structure of your tables on the fly do not have much choice but to do `select *`, because such are requirements. – Sergey Kalinichenko Feb 01 '12 at 18:00
  • @Matthew, your example is exactly the case where it is very poor idea to use select *. What happens when someone adds a column you do not want displayed and which could cause active harm if it was displayed accidentally (which can happen surprisingly often)? What if you simply don't need it? Why are you wasting processor and network resources returning data you don't need? – HLGEM Feb 01 '12 at 18:09
  • @HLGEM I can appreciate that there are reasons why not but my question is more along the lines of why it is good to use. I know there are alot of reasons why not to use this method and I am not looking for a scolding. My original title illustrated this better I think. On a side note, I am glad you are not my boss :) – Matthew Feb 01 '12 at 19:25
3

When using the wildcard * the system has to process what all the column names are for that table, and return them.

Therefore using * is worse for performance than actually typing them all out manually.

The general rule is to avoid using it if you don't have to, and instead specify the columns which you actually require.

Curtis
  • 101,612
  • 66
  • 270
  • 352
  • 6
    The db already has to process the table for field names anyways, to make sure you're not asking for fields which don't exist. The problem is the extra data retrieval - no point in loading/decoding/formatted fields which just get thrown away because the user only wanted one field and not the 50 others in the table. – Marc B Feb 01 '12 at 17:39
  • @Marc B Your comment also lends itself to the fact that if the table changes and a field is removed, less code ends up being broken because of it. – Matthew Feb 01 '12 at 17:56
  • @marcB, your comment is incorrect, there is performance hit when you use select *. – HLGEM Feb 01 '12 at 18:11
1

Using select * relies upon the order of columns being preserved. From a relational point of view the order of columns is irrelevant. The following two tables are identical when operating upon them:

create table alpha (int id, int value);
create table beta (int value, int id);

The only single thing that is different between them is the order of columns. The column names & types are identical. The order of columns should be treated in all cases as an internal implementation detail, and as such should never be programmed to.

There are occasional circumstances, where you just want to select all columns, w/o regard to order. (For example, data analysis programs might do certain kinds of queries to determine the structure of a table -- but again, the structure is the names/data types of the columns, not their order.

Don't use it.

Mike Ryan
  • 4,234
  • 1
  • 19
  • 22
1

The book SQL Antipatterns covers this topic in depth in the chapter "Implicit Columns".

The most typical use of a wildcard for the fieldnames in a query is when writing an ad-hoc query. It is faster to type '*' than it is to type in each field name that you want returned. Programmer time is fairly expensive, so it makes sense not to worry about the slight performance gain when running an ad-hoc query, where you'll waste more time typing in the names.

However, you still should have some knowledge of the database schema, because waiting for the values from a blob column on a large number of rows can waste a lot of time too, especially if all you wanted was the primary key value.

Using SELECT *, you might be getting good performance on a particular query because it's returning a small result set of maybe a few hundred bytes, however, add a couple of blob fields to the table for some other function, and maybe now the result set is in the millions of bytes, and suddenly your query is slow, even though the data that you actually use is very small. To avoid this problem, name the columns in your query.

If you're using SELECT * just so you automatically have the values of new columns when they're added, you'll still be victim to the above scenario. It's a trade off, which you will have to choose.

It's valid to use SELECT * as long as you have a reason, with the exception that laziness is not a valid reason.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • Don't worry, I would never suggest using it in all cases. I have however been making a lot of use of it lately. I believe I have progressed enough and I'm not using it anymore when I just "need to get some data" from a table. – Matthew Feb 01 '12 at 18:01