63

I've always preached to my developers that SELECT * is evil and should be avoided like the plague.

Are there any cases where it can be justified?

I'm not talking about COUNT(*) - which most optimizers can figure out.

Edit

I'm talking about production code.

And one great example I saw of this bad practice was a legacy asp application that used select * in a stored procedure, and used ADO to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.

codaddict
  • 445,704
  • 82
  • 492
  • 529
ScottE
  • 21,530
  • 18
  • 94
  • 131
  • 34
    What's wrong with it when you need to extract all the fields? – Matteo Riva Sep 03 '10 at 12:49
  • 1
    Just looked at my own code base, and it's all either (1) ad-hoc programmer-test code that's been recorded in the files as comments, or (2) conditions in drop/update scripts generated by SSMS: `if exists (select * from sysobjects where...` – Steve Jessop Sep 03 '10 at 12:57
  • 17
    "What's wrong with it" That you never know what you are going to get. Or in what order. In programming you generally want predictable results. – Stijn de Witt Sep 03 '10 at 12:58
  • 8
    Yeah, maybe someone can explain why select * is bad? I'm certainly not a DBA, but it seems pointless to write out a huge list of column names when * works just as well. – jonescb Sep 03 '10 at 12:59
  • @kemp - It depends whether you're talking about needing to extract all the columns that currently exist or whether your application has a semantic need to always extract all columns whatever they might be in the future. This last case is much rarer. – Martin Smith Sep 03 '10 at 12:59
  • @kemp: quite. I think the question is "do you ever need to extract all the fields?". In other words, should you write code which will suddenly start seeing a different set of columns if the schema changes? If you're preparing a report of some kind, do you want the report format to change when you add a timestamp to the table, or not? – Steve Jessop Sep 03 '10 at 13:01
  • You probably shouldn't be writing production code, least of all application code where it's possible to use a \* select anyway. ORM FTW. – annakata Sep 03 '10 at 13:01
  • 7
    @annakata: I doubt that answers along the lines of "you shouldn't be writing SQL, ever" are going to find much favour with hardcore SQL programmers ;-) – Steve Jessop Sep 03 '10 at 13:02
  • I mean: I have a database schema that I know, and I need a query which extracts all the fields from a table for the selected records. What problem can `SELECT *` cause? I would reference fields by name, so order is irrelevant. – Matteo Riva Sep 03 '10 at 13:02
  • 1
    @Stijn de Witt - there is a school of thought which says you shouldn't care what you get so long as it fulfils what you need. And if I'm writing code which reads SQL fields in order, I'm doing it wrong :) – annakata Sep 03 '10 at 13:03
  • @kemp In SQL server, for example, for performance reasons you should specify all columns even if you're returning all columns. And as in my example above, you can get some weird behaviours if you change the table definition later. – ScottE Sep 03 '10 at 13:04
  • @Steve - No doubt :) But most of us just don't actually need SQL anymore. Maybe if it stopped living in the seventies :P (HHOS) – annakata Sep 03 '10 at 13:05
  • @ScottE: that seems more related to using numeric indexes rather then the type of select. – Matteo Riva Sep 03 '10 at 13:05
  • 3
    What I mean is that an `INSERT` which doesn't specify column names **will** break on schema changes, whereas a `SELECT *` breaks only if you blindly use whatever you get or rely on its order (eg. numeric indexes). – Matteo Riva Sep 03 '10 at 13:10
  • 3
    @annakata: I don't entirely disagree. Writing SPs in SQL is not my favourite part of the project I do which involves it, and I'm not a proper SQL programmer anyway. But the DB is accessed from multiple languages, and I'm not aware of an ORM-based approach which would improve things. Sometimes, "this is the language I'm using" is a hard constraint, and it seems unfair to mock those stuck with it (especially since I'm one of them)... – Steve Jessop Sep 03 '10 at 13:14
  • 2
    @kemp: if you're not going to blindly rely on what you get, but instead are going to use specifically the columns you know about and understand, why not just ask for exactly those columns? I guess if you have a plan for what you'll do if a column has disappeared, other than just fail with an error, then it would make sense to `select *` and then compare the columns you get with what you semi-expect. – Steve Jessop Sep 03 '10 at 13:18
  • @Steve: I understand your point and I agree with you, but in that case `SELECT *` is just pure convenience. After all, if a column disappears and I'm not aware of it, I'll get an error anyway -- just in a different place. In short, it may not be optimal, but it doesn't seem so evil :) – Matteo Riva Sep 03 '10 at 13:23
  • @kemp: it's about robustness, I think. If you want, "everything that is currently in the schema, today, as I write this code", the way to express that is by listing the columns. `select *` means, "everything in the schema at the time I execute the query". It's fairly rare (although not unheard-of) to write code that can correctly handle columns of which the programmer had no conception at time of writing, but that's the only code which can write `select *` without thereby making itself brittle with respect to the addition of new columns, or at least getting columns it isn't actually using. – Steve Jessop Sep 03 '10 at 13:25
  • 6
    @kemp, you are trading a minute of extra work for hours of slower performance for all the users, it is just lazy not to list the columns. Every select * takes longer than a select with columns listed even when you list al of them. Every select * with an inner join returns data you demonstrably don't need wasting both server and network resources. If you do this for every query you are causing a huge performance problem that is painful to fix. In most databases you can drag them over anyway, so how hard is that? – HLGEM Sep 03 '10 at 17:04
  • 6
    One gotcha with `select *` on at least some RDBSes (maybe all) is that when you create a view like `create view X as select * from Y` the column list is actually frozen at the time of view creation. If a column gets added later, the view doesn't see it. But looking at the view definition, it looks like the view should show it. Not fun to troubleshoot. – Shannon Severance Sep 03 '10 at 23:06
  • In PostgreSQL, if you `CREATE VIEW bar AS SELECT * FROM foo;`, then add columns to foo later on, they won't automatically appear in `bar`. That's one danger of `SELECT *`, but this issue can be dodged by using a stored procedure instead. – Joey Adams Oct 28 '10 at 00:25

20 Answers20

46

I'm quite happy using * in audit triggers.

In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.

(Like dotjoe) I am also happy using it in derived tables and column table expressions. Though I habitually do it the other way round.

WITH t
     AS (SELECT *,
                ROW_NUMBER() OVER (ORDER BY a) AS RN
         FROM   foo)
SELECT a,
       b,
       c,
       RN
FROM   t; 

I'm mostly familiar with SQL Server and there at least the optimiser has no problem recognising that only columns a,b,c will be required and the use of * in the inner table expression does not cause any unnecessary overhead retrieving and discarding unneeded columns.

In principle SELECT * ought to be fine in a view as well as it is the final SELECT from the view where it ought to be avoided however in SQL Server this can cause problems as it stores column metadata for views which is not automatically updated when the underlying tables change and the use of * can lead to confusing and incorrect results unless sp_refreshview is run to update this metadata.

Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • 3
    +1, and when not using a trigger but an OUTPUT clause to send the data to a audit table as well – KM. Sep 03 '10 at 13:00
34

There are many scenarios where SELECT * is the optimal solution. Running ad-hoc queries in Management Studio just to get a sense of the data you're working with. Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema. Building disposable quick'n'dirty tools to do a one-time migration or data export.

I'd agree that in "proper" development, you should avoid it - but there's lots of scenarios where "proper" development isn't necessarily the optimum solution to a business problem. Rules and best practices are great, as long as you know when to break them. :)

Dylan Beattie
  • 53,688
  • 35
  • 128
  • 197
  • "Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema" - e.g. when implementing an ORM that builds its objects from the DB schema, rather than the other way around. – Steve Jessop Sep 03 '10 at 12:48
  • 12
    "Rules and best practices are great, as long as you know when to break them" +1, I'm always being told "that's good enough". – whostolemyhat Sep 03 '10 at 15:13
  • 3
    Unfortunately, in my experience, this is not why people use select * - they use it due to laziness. – ScottE Sep 03 '10 at 16:10
28

I'll use it in production when working with CTEs. But, in this case it's not really select *, because I already specified the columns in the CTE. I just don't want to respecify in the final select.

with t as (
    select a, b, c from foo
)

select t.* from t;
dotjoe
  • 26,242
  • 5
  • 63
  • 77
25

None that I can think of, if you are talking about live code.

People saying that it makes adding columns easier to develop (so they automatically get returned and can be used without changing the Stored procedure) have no idea about writing optimal code/sql.

I only ever use it when writing ad-hoc queries that will not get reused (finding out the structure of a table, getting some data when I am not sure what the column names are).

Oded
  • 489,969
  • 99
  • 883
  • 1,009
16

I think using select * in an exists clause is appropriate:

select some_field from some_table 
where exists 
 (select * from related_table [join condition...])

Some people like to use select 1 in this case, but it's not elegant, and it doesn't buy any performance improvements (early optimization strikes again).

Jordão
  • 55,340
  • 13
  • 112
  • 144
  • 5
    `select 1` is early optimization and not elegant? wha?! lol, it's the difference between 1 character or another. Hardly takes any effort. Actually, `select 1` probably shows a better understanding of the exists clause. It only cares about rows. – dotjoe Sep 03 '10 at 14:34
  • 9
    @dotjoe: The problem with `select 1` is that people think they're creating a smart optimization (I heard it numerous times). They're not. As for the conceptual difference, an exists clause serves to check if _any_ row exists, not just 1, so I think * is better to describe _any_ than 1. – Jordão Sep 03 '10 at 14:40
  • 1
    Yea, I said the exists clause only cares about rows. So, why do we need to use *, which only relates to columns? By using 1 you clearly show that we don't care about the columns, only rows. – dotjoe Sep 03 '10 at 15:00
  • 3
    My problem with the 1 is the "I'm a smart optimizer" problem. Some people even go as far as using `select top 1 1` (in SQL-Server). As for the conceptual problems, I don't think we can ever reach an agreement. What we should be able to do in an exists clause, to really put an end to the debate, was to omit the columns altogether: `select from table`, or have another syntax that says "I don't care": `select _ from table` (borrowed from some functional languages). When the language doesn't provide the tools to express _exactly_ what we want, conceptual differences are all open to interpretation. – Jordão Sep 03 '10 at 15:06
  • I totally agree with that! where is the `exists(from table)` syntax? omit the select clause altogether would be my vote. – dotjoe Sep 03 '10 at 16:29
  • dotjoe: What are you talking about, you can't emit the `SELECT` clause.. Where did you get that syntax? – Evan Carroll Sep 03 '10 at 20:07
  • @Evan Carroll it's my pipe dream syntax request. it doesn't exist yet. – dotjoe Sep 03 '10 at 20:09
  • 3
    Definitive answer: it *is* acceptable. Anything else is ignorance/superstition. The ANSI standard mentions that the column list in the EXIST is not parsed. Page 191, http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt. Which leads to this snippet: ...EXISTS(SELECT 1/0 FROM... please see examples here: http://stackoverflow.com/questions/2019958/ – gbn Sep 05 '10 at 08:56
  • @dotjoe, @Jordão: please see my comment above, I can't edit it now. And also see this: http://stackoverflow.com/questions/1597442 – gbn Sep 05 '10 at 09:04
  • @ScottE: For COUNT(*), please see this http://stackoverflow.com/questions/1221559/count-vs-count1/1221649#1221649 – gbn Sep 05 '10 at 09:05
  • @gbn: Thanks for the info. That's exactly what I hinted at when I mentioned that there're no performance differences between * and 1. It makes sense that the column list is completely ignored, as the only thing that matters is if rows exists. What's left, then, is a matter of style, and IMO * is better than the alternatives. – Jordão Sep 05 '10 at 11:11
  • oh I like that `select 1/0` example. See, they really should just drop the select part altogether. @Jordão I know you like your * but your only knock against my 1 is that some other dev thinks it's an optimization. And I say that's his problem not mine. lol. – dotjoe Sep 05 '10 at 18:15
  • 1
    OK, let's see if we can agree on this: `select 'rows' from table`. It still doesn't change the validity of my answer. – Jordão Sep 05 '10 at 23:15
7

In production code, I'd tend to agree 100% with you.

However, I think that the * more than justifies its existence when performing ad-hoc queries.

Alex Humphrey
  • 6,099
  • 4
  • 26
  • 41
6

You've gotten a number of answers to your question, but you seem to be dismissing everything that isn't parroting back what you want to hear. Still, here it is for the third (so far) time: sometimes there is no bottleneck. Sometimes performance is way better than fine. Sometimes the tables are in flux, and amending every SELECT query is just one more bit of possible inconsistency to manage. Sometimes you've got to deliver on an impossible schedule and this is the last thing you need to think about.

If you live in bullet time, sure, type in all the column names. But why stop there? Re-write your app in a schema-less dbms. Hell, write your own dbms in assembly. That'd really show 'em.

shanusmagnus
  • 2,340
  • 2
  • 20
  • 19
  • You're rocking the boat, SM. They all prefer to adhere to the religion in vogue, than notice that actually, one size doesn't fit all. I'm just surprised that no one's advocating automated tools (and their inevitable costs) to rewrite all that SQL that often doesn't need changing anyway. – ChrisA Sep 03 '10 at 22:19
  • Agreed. It all depends on your context - for large projects, you want to optimize everything. But for a lot of smaller projects (which can still be very large expensive affairs), 'SELECT *' is fine. For most of my projects, dev time is a much higher priority than performance. And any decent ORM will take care of the indexing/adding columns problem. I'd say it's justified when the performance penalties don't matter, and it does not contribute to bugs/vulnerabilities. Certain levels of optimization is good, but not always necessary - or appropriate. – rocketmonkeys Dec 06 '10 at 21:24
  • 1
    Stating the obvious here: it's not really about performance. "SELECT *" is a bad habit because it makes for fragile code. –  Jul 18 '13 at 20:28
4

And remember if you use select * and you have a join at least one field will be sent twice (the join field). This wastes database resources and network resources for no reason.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
2

As a tool I use it to quickly refresh my memory as to what I can possibly get back from a query. As a production level query itself .. no way.

Peter M
  • 7,309
  • 3
  • 50
  • 91
2

When creating an application that deals with the database, like phpmyadmin, and you are in a page where to display a full table, in that case using SELECT * can be justified, I guess.

aularon
  • 11,042
  • 3
  • 36
  • 41
  • Nope it can't in that case. Select * is still a performance problem and columns are added to tables for adminstrative purposes that users should not see like GUIDs. – HLGEM Sep 03 '10 at 13:57
  • I understand, but in application in the nature of phpmyadmin, where the user is a database administrator, who build and is allowed to see column names, in that case I think it's justified to just use `SELECT *`, and I don't think it will be slower than building a query with _all_ fieldnames and execute it. – aularon Sep 03 '10 at 14:01
  • +1, aularon. I was using a DB editor the other day that has this problem. Whenever a column changed outside the editor I had to manually refresh the metadata to get the table views to show the new columns. If the editor just did a `SELECT *` when it queried, I wouldn't have to worry about cached metadata being out-of-date. – Gabe Sep 05 '10 at 07:16
2

About the only thing that I can think of would be when developing a utility or SQL tool application that is being written to run against any database. Even here though, I would tend to query the system tables to get the table structure and then build any necessary query from that.

There was one recent place where my team used SELECT * and I think that it was ok... we have a database that exists as a facade against another database (call it DB_Data), so it is primarily made up of views against the tables in the other database. When we generate the views we actually generate the column lists, but there is one set of views in the DB_Data database that are automatically generated as rows are added to a generic look-up table (this design was in place before I got here). We wrote a DDL trigger so that when a view is created in DB_Data by this process then another view is automatically created in the facade. Since the view is always generated to exactly match the view in DB_Data and is always refreshed and kept in sync, we just used SELECT * for simplicity.

I wouldn't be surprised if most developers went their entire career without having a legitimate use for SELECT * in production code though.

Tom H
  • 46,766
  • 14
  • 87
  • 128
2

I've used select * to query tables optimized for reading (denormalized, flat data). Very advantageous since the purpose of the tables were simply to support various views in the application.

Andrew Church
  • 1,391
  • 11
  • 13
2

How else do the developers of phpmyadmin ensure they are displaying all the fields of your DB tables?

JD Isaacks
  • 56,088
  • 93
  • 276
  • 422
1

It is conceivable you'd want to design your DB and application so that you can add a column to a table without needing to rewrite your application. If your application at least checks column names it can safely use SELECT * and treat additional columns with some appropriate default action. Sure the app could consult system catalogs (or app-specific catalogs) for column information, but in some circumstances SELECT * is syntactic sugar for doing that.

There are obvious risks to this, however, and adding the required logic to the app to make it reliable could well simply mean replicating the DB's query checks in a less suitable medium. I am not going to speculate on how the costs and benefits trade off in real life.

In practice, I stick to SELECT * for 3 cases (some mentioned in other answers:

  • As an ad-hoc query, entered in a SQL GUI or command line.
  • As the contents of an EXISTS predicate.
  • In an application that dealt with generic tables without needing to know what they mean (e.g. a dumper, or differ).
Edmund
  • 10,533
  • 3
  • 39
  • 57
1

Yes, but only in situations where the intention is to actually get all the columns from a table not because you want all the columns that a table currently has.

For example, in one system that I worked on we had UDFs (User Defined Fields) where the user could pick the fields they wanted on the report, the order as well as filtering. When building a result set it made more sense to simply "select *" from the temporary tables that I was building instead of having to keep track of which columns were active.

Biff MaGriff
  • 8,102
  • 9
  • 61
  • 98
1
  1. I have several times needed to display data from a table whose column names were unknown. So I did SELECT * and got the column names at run time.

  2. I was handed a legacy app where a table had 200 columns and a view had 300. The risk exposure from SELECT * would have been no worse than from listing all 300 columns explicitly.

egrunin
  • 24,650
  • 8
  • 50
  • 93
  • If you don't know the columns at run time, I would suspect a design problem. But true it is the only viable choice inthat case. – HLGEM Sep 03 '10 at 16:59
0

I know I'm very late to the party but I'll chip in that I use select * whenever I know that I'll always want all columns regardless of the column names. This may be a rather fringe case but in data warehousing, I might want to stage an entire table from a 3rd party app. My standard process for this is to drop the staging table and run

select * 
into staging.aTable 
from remotedb.dbo.aTable

Yes, if the schema on the remote table changes, downstream dependencies may throw errors but that's going to happen regardless.

Bob Probst
  • 9,533
  • 8
  • 32
  • 41
0

Select * in production code is justifiable any time that:

  • it isn't a performance bottleneck
  • development time is critical

Why would I want the overhead of going back and having to worry about changing the relevant stored procedures, every time I add a field to the table?

Why would I even want to have to think about whether or not I've selected the right fields, when the vast majority of the time I want most of them anyway, and the vast majority of the few times I don't, something else is the bottleneck?

If I have a specific performance issue then I'll go back and fix that. Otherwise in my environment, it's just premature (and expensive) optimisation that I can do without.


Edit.. following the discussion, I guess I'd add to this:

... and where people haven't done other undesirable things like tried to access columns(i), which could break in other situations anyway :)

ChrisA
  • 4,163
  • 5
  • 28
  • 44
  • 4
    Because you should see what is being returned and whether you need it in the stored proc now or not. Fields are often added that are not needed and which will confuse the user if they are shown. Being too lazy to do necessary maintenance when you make a change is an excuse not a good practice. – HLGEM Sep 03 '10 at 16:59
  • I agree with you. In some parts of an app performance is not particularly important, in which case Select * does fine thanks very much- listing the fields has an impact on future development time and potential future bugs – DannykPowell Sep 03 '10 at 20:26
  • 2
    Using select * is a maintenance nightmare. Been there, done that. Had to fix other's crappy code. Change your schema? Everything can break. Just don't use select * in production code. – Starkey Sep 03 '10 at 20:35
  • Confuse the user? What nonsense this is. As if 'select *' in the SQL implies anything at all about what will be shown to the user. Ever heard of separation of data layer from UI? It would seem not. And 'select *' can only break other things if they make assumptions about what columns appear in a particular order. Of course 'columns(i)' could break if the underlying data changes - but it could break if you specify the columns in your select too, and that changes. Still, stick to the usual religious dogma if you prefer. – ChrisA Sep 03 '10 at 22:14
0

If you want to find all the columns and want order, you can do the following (at least if you use MySQL):

SHOW COLUMNS FROM mytable FROM mydb; (1)

You can see every relevant information about all your fields. You can prevent problems with types and you can know for sure all the column names. This command is very quick, because you just ask for the structure of the table. From the results you will select all the name and will build a string like this:

"select " + fieldNames[0] + ", fieldNames[1]" + ", fieldNames[2] from mytable". (2)

If you don't want to run two separate MySQL commands because a MySQL command is expensive, you can include (1) and (2) into a stored procedure which will have the results as an OUT parameter, that way you will just call a stored procedure and every command and data generation will happen at the database server.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
0

Depends on the context of the production software.

If you are writing a simple data access layer for a table management tool where the user will be selecting tables and viewing results in a grid, then it would seem *SELECT ** is fine.

In other words, if you choose to handle "selection of fields" through some other means (as in automatic or user-specified filters after retrieving the resultset) then it seems just fine.

If on the other hand we are talking about some sort of enterprise software with business rules, a defined schema, etc. ... then I agree that *SELECT ** is a bad idea.

EDIT: Oh and when the source table is a stored procedure for a trigger or view, "*SELECT **" should be fine because you're managing the resultset through other means (the view's definition or the stored proc's resultset).

Allbite
  • 2,367
  • 1
  • 24
  • 22