28

CASE 1: I have a table with 30 columns and I query using 4 columns in the where clause.

CASE 2: I have a table with 6 columns and I query using 4 columns in the where clause.

What is the difference in performance in both cases?

For example i have table

table A
{
  b varchar(10),
  c varchar(10),
  d varchar(10),
  e varchar(10),
  f varchar(10),
  g varchar(10),
  h varchar(10)

}

SELECT b,c,d
FROM A
WHERE f='foo'

create table B
{
  b varchar(10),
  c varchar(10),
  d varchar(10),
  e varchar(10),
  f varchar(10)

}

SELECT b,c,d
FROM B
WHERE f='foo'

Both A And B table have same structure means only difference in number of column and column used in where condition is also same and column in select is also same. difference is that table B only have some unused column these are not being used in select and where condition in that case is there any difference in performance of both queries ?

recursive
  • 83,943
  • 34
  • 151
  • 241
Pradeep Gaur
  • 563
  • 1
  • 8
  • 14

6 Answers6

22

Does the total number of columns in a table impact performance (if the same subset of columns is selected, and if there are no indices on the table)

Yes, marginally, with no indexes at all, both queries (Table A and Table B) will do table scans. Given that Table B has fewer columns than Table A, the rows per page (density) will be higher on B and so B will be marginally quicker as fewer pages need to be fetched.

However, given that your queries are of the form:

SELECT b,c,d
FROM X
WHERE f='foo';

the performance of the query will be dominated by the indexing on column f, rather than the number of columns in the underlying tables.

For the OP's exact queries, the fastest performance will result from the following indexing:

  • Index on A(f) INCLUDE (b,c,d)
  • Index on B(f) INCLUDE (b,c,d)

Irrespective of the number of columns in Table A or Table B, with the above indexes in place, performance should be identical for both queries (assuming the same number of rows and similar data in both tables), given that SQL will hit the indexes which are now of similar column widths and row densities, without needing any additional data from the original table.

Does the number of columns in the select affect query performance?

The main benefit of returning fewer columns in a SELECT is that SQL might be able to avoid reading from the table / cluster, and instead, if it can retrieve all the selected data from an index (either as indexed columns and / or included columns in the case of a covering index).

Obviously, the columns used in the predicate (where filter), i.e. f in your example, MUST be in the indexed columns of the index, and the data distribution must be sufficiently selective, in order for an index to be used in the first place.

There is also a secondary benefit in returning fewer columns from a SELECT, as this will reduce any I/O overhead, especially if there is a slow network between the Database server and the app consuming the data - i.e. it is good practice to only ever return the columns you actually need, and to avoid using SELECT *.

Edit

Some other plans:

  • Index on B(f) with no other key or INCLUDE columns, or with an incomplete set of INCLUDE columns (i.e. one or more of b, c or d are missing):

SQL Server will likely need to do a Key or RID Lookup as even if the index is used, there will be a need to "join" back to the table to retrieve the missing columns in the select clause. (The lookup type depends on whether the table has a clustered PK or not)

  • Straight non clustered index on B(f,b,c,d)

This will still be very performant, as the index will be used and the table avoided, but won't be quite as good as the covering index, because the density of the index tree will be less due to the additional key columns in the index.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • 2
    he wasn't asking about selecting more columns... of course that will effect performance. he asked **does having more columns in a table slow down performance?** – oldboy Oct 29 '18 at 20:21
  • @Anthony It's all in the indexes and page densities. If the subset of columns selected match a narrower non clustered or covering index, then the actual 'table' can be avoided altogether. – StuartLC Oct 29 '18 at 20:32
  • so would a table of 56 columns, which is 99% number data types (mostly `smallint` with a few `mediumint` and one `tinyint`) significantly effect the execution times of queries? typically i will most likely be selecting all of the data at once – oldboy Oct 29 '18 at 20:47
  • @StuartLC this table might even have 110 columns if its not going to cause issues, but 54 of those would potentially be decimal(3,0) or varchar(3) – oldboy Oct 29 '18 at 21:33
  • @Anthony I've moved the order of the answer so that the 'unindexed' answer is first. However, the performance of both OP's queries will be dominated by indexing on the filtered column(s) (`f` in OP's case). During table design, splitting out a table's columns into 'partial tables' to make the pages narrower rather than keeping all logically related 4NF columns of the same table entity together is drastic. If it really came to that, instead of messing with a logical table design, I would instead look at external NoSql caching alternatives like Redis etc for blinding performance. – StuartLC Nov 20 '18 at 13:42
6

Test it and see!

There will be a performance difference, however 99% of the time you won't notice it - usually you won't even be able to detect it!

You can't even guarantee that that the table with fewer columns will be quicker - if its bothering you then try it and see.

Technical rubbish: (from the perspective of Microsoft SQL Server)

With the assumption that in all other respects (indexes, row counts, the data contained in the 6 common columns etc...) the tables are identical, then the only real difference will be that the larger table is spread over more pages on disk / in memory.

SQL server only attempts to read the data it absolutely requires, however it will always load an entire page at a time (8 KB). Even with the exact same amount data is required as the output to the query, if that data is spread over more pages then more IO is required.

That said, SQL server is incredibly efficient with its data access, and so you are very unlikely to see a noticeable impact on performance except in extreme circumstances.

Besides, it is also likely that your query will be run against the index rather than the table anyway, and so with indexes exactly the same size the change is likely to be 0.

Justin
  • 84,773
  • 49
  • 224
  • 367
4

Unless you have a very wide column set difference with no index being used (thus a table scan) you should see little difference in performance. That being said, it is always useful/benificial to return as few columns as possible to satisfy your needs. The catch here is that greater benifit can be had by returning the columns you need rather than a second database fetch for other columns.

  • Get what you need
  • avoid second database query on same table for same rows
  • use an index on the select column(s) (WHERE clause restricter)
  • restrict columns if you do not need them to enhance data server memory efficiency/paging
Mark Schultheiss
  • 32,614
  • 12
  • 69
  • 100
  • This being said, SQL Server will, at times grab an entire table/index in memory then work it which would make the column numbers mute - trying to find the reference. – Mark Schultheiss Sep 08 '10 at 13:27
2

There will be no performance difference based on the column position. Now the construction of the table is a different story e.g. number of rows, indexes, number of columns etc.

The scenario you are talking about where you are comparing the position of the column in the two tables is like comparing apples to oranges almost, because there are so many different variables besides the column position.

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
1

Depends on width of the table (Bytes per row), how many rows in the table, and whether there are indices on the columns used by the query. No definitive answer without that info. However, the more columns in the table, chances are it is wider. But the effect of a proper index is much more significant than the effect of the table size.

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
1

Since you specified you are using the WHERE clause it will depend on how many rows are returned. If the value in your WHERE clause is UNIQUE or a PRIMARY KEY than the difference is almost non-existent. You can use EXPLAIN ANALYZE in front of your SELECT statement to view the planning time and execution time values and than you can compare your queries.

Noah Kanyo
  • 500
  • 1
  • 3
  • 8