Does the total number of columns in a table impact performance (if the same subset of columns is selected, and if there are no indices on the table)
Yes, marginally, with no indexes at all, both queries (Table A and Table B) will do table scans. Given that Table B
has fewer columns than Table A
, the rows per page (density) will be higher on B
and so B
will be marginally quicker as fewer pages need to be fetched.
However, given that your queries are of the form:
SELECT b,c,d
FROM X
WHERE f='foo';
the performance of the query will be dominated by the indexing on column f
, rather than the number of columns in the underlying tables.
For the OP's exact queries, the fastest performance will result from the following indexing:
- Index on
A(f) INCLUDE (b,c,d)
- Index on
B(f) INCLUDE (b,c,d)
Irrespective of the number of columns in Table A or Table B, with the above indexes in place, performance should be identical for both queries (assuming the same number of rows and similar data in both tables), given that SQL will hit the indexes which are now of similar column widths and row densities, without needing any additional data from the original table.
Does the number of columns in the select affect query performance?
The main benefit of returning fewer columns in a SELECT
is that SQL might be able to avoid reading from the table / cluster, and instead, if it can retrieve all the selected
data from an index (either as indexed columns and / or included columns in the case of a covering index).
Obviously, the columns used in the predicate (where filter), i.e. f
in your example, MUST be in the indexed columns of the index, and the data distribution must be sufficiently selective, in order for an index to be used in the first place.
There is also a secondary benefit in returning fewer columns from a SELECT
, as this will reduce any I/O overhead, especially if there is a slow network between the Database server and the app consuming the data - i.e. it is good practice to only ever return the columns you actually need, and to avoid using SELECT *
.
Edit
Some other plans:
- Index on
B(f)
with no other key or INCLUDE
columns, or with an incomplete set of INCLUDE
columns (i.e. one or more of b, c or d
are missing):
SQL Server will likely need to do a Key or RID Lookup as even if the index is used, there will be a need to "join" back to the table to retrieve the missing columns in the select clause. (The lookup type depends on whether the table has a clustered PK or not)
- Straight non clustered index on
B(f,b,c,d)
This will still be very performant, as the index will be used and the table avoided, but won't be quite as good as the covering index, because the density of the index tree will be less due to the additional key columns in the index.