why there is a performance difference between single column select and multi column select

Question

I have table Report which has 10+ columns, where each field except autoincrement id is varchar. Maximum size of any row of table is ~80000 bytes.

I observed the considerable amount of difference between below 2 queries where both server and client are on the same system(with ~1.5M rows)(also having index on column name)

select * from Report;

&

select name from Report;

I went through other stackoverflow ques, where Ans suggesting that

"One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data. "

My Ques are :

1) Even when using index we need to read from actual DB(i.e. Disk), so same Disk I/o will get performed when we selecting single column or multiple columns(let say by using *).

2) Can someone explain it with little internals that why there is a difference between single column select and multiple column select. As I understood databases try to store the complete row in same disk sector.

Let say for some query for User table like

select first_name from user & 
select first_name, age from user

3) Does DB perform same disk I/O(for above queries), or disk I/O's may differ and what if User table has more then 10 columns and we are selecting 5~8 cols still disk I/O remains same?

Already read in other Ans that using select * is not preferable and it may cause issues with views and maintenance(cases when adding new columns to DB)

It maybe a silly que, since still learning databases(newbie to subject). Please provide link if some ques already answered anywhere.

Your assumptions are wrong. If the index contains all the selected columns, SQL Server will not hit the disk to get them. — Panagiotis Kanavos, Sep 04 '14 at 09:45
http://stackoverflow.com/questions/609343/what-are-covering-indexes-and-covered-queries-in-sql-server but the actual answer given down by Panagiotis hits the nail on the head perfectly. — Dave Brown, Sep 04 '14 at 10:00

score 5 · Answer 1 · answered Sep 04 '14 at 09:51

Your assumptions are wrong. If the index contains all the selected columns, SQL Server will not hit the disk to get them, it will retrieve them from the index data. Indexes are much more likely to be kept in memory than data pages so a query may not have to hit the disk at all.

If your select contains columns that you don't need but are not contained in any indexes, the server will have to read the data from disk for no reason.

Another thing that does change is network I/O. A production server probably has to serve several thousand queries per minute, to hundreds of clients. Including useless columns in the data impacts all clients. Things are even worse if you consider servers used in popular web sites where the clients may be several thousand.

What if selected column/columns are not indexed then, Is there any difference between single column select or multi column select? Does my (2), (3) ques are valid without indexes? — a.m., Sep 04 '14 at 10:26

score 1 · Answer 2 · answered Sep 04 '14 at 09:52

One possible answer is that MySQL will not bother reading the table if the data it needs is in the Index. If name is indexed then that might make sense.

Anpther is simply the time it takes to transfer the results, which can be significant depending on your setup/network speeds etc.

score 1 · Answer 3 · answered Sep 04 '14 at 20:41

The issue is TOAST.

Long data values can have those values compressed and/or stored out-of-line from the rest of the row. This is called TOAST.

If you requested a data value that has been TOASTED, the database has to stop to fetch it and uncompress it, which takes time.

On the other hand, if you refrain from selecting columns you don't need, it doesn't have to fetch those values. Because the biggest values are now out-of-line, the main part of the rows are more tightly packed and better cached, giving much better performance.

In general, the largest values are also the ones least likely to be needed, so this is usually an excellent trade off.

The moral of the story is when performance is important to you, don't request columns you don't need, particularly when those columns can have large values.

PostgreSQL does have index-only-scans, but unless every column in your table is included in a single index, it is not possible to do an index-only-scan to support a select *, so that is not the explanation here.

Let say if rows are inside limit(commonly 8 kB), so that it wont be Toasted,Then Is there any difference between single column select or multi column select(If we don't have indexes on select columns). — a.m., Sep 05 '14 at 04:58

why there is a performance difference between single column select and multi column select

3 Answers3

Linked