0

For example there are 20 fields in a record, which includes 5 indexed fields out of 20 fields. Given proper indexes on columns are set up and the data will be retrieved with the indexed field. I want to discuss 2 situations below.

  1. retrieving a field from a record
  2. retrieving a entire record

The only difference I know is that in case 1, the system uses small amount of data, so it spent less on the bus traffic. But when it comes to retrieving time, I'm not sure in these 2 cases if there will be any difference in terms of hardware operation, because I think the main cost on retrieving task on DB is finding the record regardless of how many fields. Is this correct?

Marcus Thornton
  • 5,955
  • 7
  • 48
  • 50
  • Thanks! I think my issue and this one is a little bit different in the setting. In that case column1, column2, column3... are all the fields which means that the retrieving targets are the same. But what I want to discuss is retrieving a field from a record vs retrieving a entire record. The retrieving targets are different. – Marcus Thornton Apr 17 '13 at 04:36
  • This is **highly** vendor-specific. For SQL Server, it will always read and write data in 8k pages - so even if you just change a `BIT` you'll have 8K being read and 8k being written. – marc_s Apr 17 '13 at 04:50
  • There is valuable and correct advice about this situation in the suggested duplicate question (covering indexes, etc). If you need to select fields that are from a single index, then maybe the DBMS will do an index-only scan without fetching the data rows from the table — that will be faster because of fewer I/Os. If you need all the data, it will usually have to read rows from the table (there won't be an index covering all columns) as well as any index used to speed up the query. If you don't need all the data, it is worth trying to minimize what you select; it takes time to transfer it too. – Jonathan Leffler Apr 18 '13 at 00:30

2 Answers2

0

Assuming you are retrieving from a heap-based table and your WHERE clause is identical in both cases:

  1. It matters whether the field(s) being retrieved is in the index or not. If it's in the index, the DBMS will not need to access the table heap - this is called index-only scan. If it's not in the index, the DBMS must access the heap page in which the the field resides, possibly requiring additional I/O if not already cached.
  2. If you are reading the whole row, it is less likely all of its fields are covered by the index the DBMS query planner chose to use, so it is more likely you'll pay the I/O cost of the table heap access. This is not so bad for a single row, but can absolutely destroy performance if many rows are retrieved and index's clustering factor is bad1.

The situation is similar but slightly more complicated for clustered tables, since indexes tend to cover PK fields even when not explicitly mentioned in CREATE INDEX, and the "main" portion of the table cannot (typically) be accessed directly, but through an index seek.

On top of that, transferring more data puts more pressure on network bandwidth, as you already noted.

For these reasons, always try to select exactly what you need and no more.


1 A good query optimizer will notice that and perform the full table scan because it's cheaper, even though the index is available.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
0

Reading several material I came to conclusions:

  1. Select only those fields required when performing a query.
  2. If only indexed field will be scanned, the DB will perform index-only searching, which is fast.
  3. When trying to fetch many rows which includes un-indexed fields, the worst case is that the query will perform as many block I/Os as number of rows, which is very expensive cost. So the better way is to perform full table scan because the total number of block I/Os equals to the total number of blocks, which could be much smaller than the number of rows.
Marcus Thornton
  • 5,955
  • 7
  • 48
  • 50