6

Possible Duplicate:
Why is SELECT * considered harmful?

Probably a database nOOb question.

Our application has a table like the following

TABLE WF

Field              | Type        | Null | Key | Default | Extra          |
+--------------------+-------------+------+-----+---------+----------------+
| id                 | int(11)     | NO   | PRI | NULL    | auto_increment | 
| children           | text        | YES  |     | NULL    |                | 
| w_id               | int(11)     | YES  |     | NULL    |                | 
| f_id               | int(11)     | YES  |     | NULL    |                | 
| filterable         | tinyint(1)  | YES  |     | 1       |                | 
| created_at         | datetime    | YES  |     | NULL    |                | 
| updated_at         | datetime    | YES  |     | NULL    |                | 
| status             | smallint(6) | YES  |     | 1       |                | 
| visible            | tinyint(1)  | YES  |     | 1       |                | 
| weight             | int(11)     | YES  |     | NULL    |                | 
| root               | tinyint(1)  | YES  |     | 0       |                | 
| mfr                | tinyint(1)  | YES  |     | 0       |                | 
+--------------------+-------------+------+-----+---------+----------------+

This table is expected to be upwards of ten million records. The schema is not expected to change much. I need to retrieve the columns f_id, children, status, visible, weight, root, mfr.

Which approach is faster for data retrieval?

1) Select * from WF where w_id = 1 AND status = 1;

I will strip the unnecessary columns in the application layer.

2) Select children,f_id,status,visible,weight,root,mfr from WF where w_id = 1 AND status = 1;

There is no need to strip the unnecessary columns as its pre-selected in the query.

Does any one have a real life benchmark as to which is faster. I know some say Select * is evil, but will MySQL respond faster while trying to get the whole chunk as opposed to retrieving selective columns?

I am using MySQL version: 5.1.37-1ubuntu5 (Ubuntu) and the application is Rails3 app.

Community
  • 1
  • 1
paddle42380
  • 6,921
  • 7
  • 32
  • 40
  • 7
    You're the one with the 10 million record table; how much more real life can you get than your benchmark? – JeffO Feb 16 '11 at 18:31
  • I am doing a migration onto Rails. I dont have access to the current production. Technically, I do not have the data right now. – paddle42380 Feb 16 '11 at 18:32
  • @Jeff O I wish you got points for comment upvotes. That comment says it all. – corsiKa Feb 16 '11 at 18:32
  • That's a lot of records, you'll probably will have to migrate before that happens – Carlos Valenzuela Feb 16 '11 at 18:33
  • @Chuck - I agree i need the data, but it will not happen until another couple of weeks and I need to proceed with my queries right away – paddle42380 Feb 16 '11 at 18:37
  • Even if you didn't have 10m rows, you could generate dummy data in a staging environment to test. http://www.generatedata.com/ – p.campbell Feb 16 '11 at 18:40
  • 1
    @p.campbell et al: Trigger happy closers - the "duplicate" question is too general, this one has a specific scenario with specific structure and volumes as well as a specific RDBMS (MySQL). Not to mention app framework (Ruby) as well. A very detailed specific answer could be arrived at for this question – RichardTheKiwi Feb 16 '11 at 18:48
  • @Richard: the base of the question is whether to specify the `n` columns, or retrieve all columns by use of `*`. Rails, Ubuntu, MySQL, etc, have no relevance to the core of the question. It boils down to 'should I employ an anti-pattern on my RDBMS'? I'd hardly categorize OMG and Joe as trigger-happy closers, but rather insightful answerers. Thanks for your opinion! Have A Nice Day! – p.campbell Feb 16 '11 at 19:19
  • 1
    @p.c I disagree. The question was not about maintainability, robustness, standards, (anti)patterns - it was simply put `Which approach is faster for data retrieval?` There are other questions around (C arena) about the fastest way to do something, for which the answer could be ASM (not pretty C) which would be the right answer. The linked "duplicate" is discussing broad-brush theory and best practice. – RichardTheKiwi Feb 16 '11 at 19:26

2 Answers2

5

As an example of how a select statement that includes a subset of columns can be significantly faster, it can use a covering index on the table that includes just those columns, potentially resulting in much better query performance.

Michael Goldshteyn
  • 71,784
  • 24
  • 131
  • 181
  • Upvote for mentioning covering index. I came here to say that. – Nathan DeWitt Feb 16 '11 at 18:40
  • Thanks, got the jist about the covering index. – paddle42380 Feb 16 '11 at 18:46
  • @papdel A covering index for this query will take 8 (of 12) columns, and will be terribly expensive (relative measure) to maintain – RichardTheKiwi Feb 16 '11 at 18:57
  • Thanks Richard. I got too psyched with all the sudden backlash! – paddle42380 Feb 16 '11 at 19:01
  • @Richard, that depends on the ratio of inserts to queries. Even an 8 of 12 covering index could be worthwhile if the number of queries far outnumbers the number of inserts. Also, the 8 of 12 columns was an example, I gave a general answer to the general question. – Michael Goldshteyn Feb 16 '11 at 20:19
  • @Mic And that is why I object to the closing of the question on the basis of a duplicate. Your general answer was covered in the linked question in far more detail, but the metrics may not be relevant to the question specifics. If anything the question could be closed as "too localised" not as "duplicate" – RichardTheKiwi Feb 16 '11 at 20:42
0

If you return fewer columns there is less data to go across the network and less data for the database to process and it will almost always return faster. Databases also tend to be slower using select * because the database then has to go figure out what the columns are and thus do more work than when you specify. Further select * will often return bad results if the structure changes significantly. It may end up showing the user fields you don;t want them to see or if someone is silly enough to rearrange the columns, then the application may actually appear to show things in the wrong order or if doing an insert from the data, put them in the wrong column. It is almost alawys a poor practice to use selct * in production code.

HLGEM
  • 94,695
  • 15
  • 113
  • 186