3

I have a rails/backbone single page application processing a lot of SQL queries.

Is this request:

SELECT * FROM `posts` WHERE `thread_id` = 1

faster than this one:

SELECT `id` FROM `posts` WHERE `thread_id` = 1

How big is the impact of selecting unused columns on the query execution time?

nakwa
  • 1,157
  • 1
  • 13
  • 25

4 Answers4

6

For all practical purposes, when looking for a single row, the difference is negligible. As the number of result rows increases, the difference can become more and more important, but as long as you have an index on thread_id and you are not more than 10-20% of all the rows in the table, here is still not a big issue. FYI the differentiation factor comes from the fact that selecting * will force, for each row, an additional lookup in the primary index. Selecting only id can be satisfied just by looking up the secondary index on thread_id.

There is also the obvious cost associated with any large field, like BLOB documents or big test fields. If the posts fields have values measuring tens of KBs, then obviously retrieving them adds extra transfer cost.

All these assume a normal execution engine, with B-Tree or ISAM row-mode storage. Almost all 'tables' and engines would fall into this category. The difference would be significant if you would be talking about a columnar storage, because columnar storage only reads the columns of interests and reading extra columns unnecessary impacts more visible such storage engines.

Having or not having an index on thread_id will have a hugely more visible impact. Make sure you have it.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
3

Selecting fewer columns is generally faster. Unfortunately, it is hard to say exactly how much the time difference will be. It may depend on things like how many columns there are and what data is in them (for example, large CLOBS can take longer to fetch than simple integers), what indexes have been set up, and the network latency between you and the database server.

For a definitive answer on the time difference, the best I can say is do both queries and see how long each takes.

Paul J Abernathy
  • 993
  • 2
  • 12
  • 25
1

There will be two components: The query time and the I/O time (you could also break down the I/O into server I/O and server-client (network) I/O).

Selecting just one column will be faster in both respects - certainly because there's less data to fetch and transmit, but also because the column in question could be a part of whatever index is used to find the data, so the server may not have to look up the actual data pages - it may be able to pull the data directly from the index.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
0

The performance difference is almost certainly insignificant for your application. Try it and see whether you can detect a difference; it is very simple to try.

Carl Manaster
  • 39,912
  • 17
  • 102
  • 155