SELECTing non-indexed column increases 'sending data' 25x - why and how to improve?

Question

Given this table on local MySQL instance 5.1 with query caching off:

show create table product_views\G
*************************** 1. row ***************************
       Table: product_views
Create Table: CREATE TABLE `product_views` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `dateCreated` datetime NOT NULL,
  `dateModified` datetime DEFAULT NULL,
  `hibernateVersion` bigint(20) DEFAULT NULL,
  `brandName` varchar(255) DEFAULT NULL,
  `mfrModel` varchar(255) DEFAULT NULL,
  `origin` varchar(255) NOT NULL,
  `price` float DEFAULT NULL,
  `productType` varchar(255) DEFAULT NULL,
  `rebateDetailsViewed` tinyint(1) NOT NULL,
  `rebateSearchZipCode` int(11) DEFAULT NULL,
  `rebatesFoundAmount` float DEFAULT NULL,
  `rebatesFoundCount` int(11) DEFAULT NULL,
  `siteSKU` varchar(255) DEFAULT NULL,
  `timestamp` datetime NOT NULL,
  `uiContext` varchar(255) DEFAULT NULL,
  `siteVisitId` bigint(20) NOT NULL,
  `efficiencyLevel` varchar(255) DEFAULT NULL,
  `siteName` varchar(255) DEFAULT NULL,
  `clicks` varchar(1024) DEFAULT NULL,
  `rebateFormDownloaded` tinyint(1) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `siteVisitId` (`siteVisitId`,`siteSKU`),
  KEY `FK52C29B1E3CAB9CC4` (`siteVisitId`),
  KEY `rebateSearchZipCode_idx` (`rebateSearchZipCode`),
  KEY `FIND_UNPROCESSED_IDX` (`siteSKU`,`siteVisitId`,`timestamp`),
  CONSTRAINT `FK52C29B1E3CAB9CC4` FOREIGN KEY (`siteVisitId`) REFERENCES `site_visits` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=32909504 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

This query takes ~3s:

    SELECT pv.id, pv.siteSKU
      FROM product_views pv 
CROSS JOIN site_visits sv 
     WHERE pv.siteVisitId = sv.id 
       AND pv.siteSKU = 'foo' 
       AND sv.siteId = 'bar' 
       AND sv.postProcessed = 1 
       AND pv.timestamp >= '2011-05-19 00:00:00' 
       AND pv.timestamp < '2011-06-18 00:00:00';

But this one (non-indexed column added to SELECT) takes ~65s:

    SELECT pv.id, pv.siteSKU, pv.hibernateVersion 
      FROM product_views pv 
CROSS JOIN site_visits sv 
     WHERE pv.siteVisitId = sv.id 
       AND pv.siteSKU = 'foo' 
       AND sv.siteId = 'bar' 
       AND sv.postProcessed = 1 
       AND pv.timestamp >= '2011-05-19 00:00:00' 
       AND pv.timestamp < '2011-06-18 00:00:00';

Nothing in 'where' or 'from' clauses is different. All the extra time is spent in 'sending data':

mysql> show profile for query 1;
+--------------------+-----------+
| Status             | Duration  |
+--------------------+-----------+
| starting           |  0.000155 |
| Opening tables     |  0.000029 |
| System lock        |  0.000007 |
| Table lock         |  0.000019 |
| init               |  0.000072 |
| optimizing         |  0.000032 |
| statistics         |  0.000316 |
| preparing          |  0.000034 |
| executing          |  0.000002 |
| Sending data       | 63.530402 |
| end                |  0.000044 |
| query end          |  0.000005 |
| freeing items      |  0.000091 |
| logging slow query |  0.000002 |
| logging slow query |  0.000109 |
| cleaning up        |  0.000004 |
+--------------------+-----------+
16 rows in set (0.00 sec)

I understand that using a non-indexed column in where clause would slow things down, but why here? What can be done to improve the latter case - given that I will actually want to SELECT(*) from product_views?

EXPLAIN Output

explain extended select pv.id, pv.siteSKU from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU='foo' and sv.sit eId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ | id | select_type | table | type   | possible_keys                          | key                  | key_len | ref | rows  | filt ered | Extra            |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ |  1 | SIMPLE      | pv    | ref    | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258     | const                | 41810 |   10
0.00 | Using where; Using index | |  1 | SIMPLE      | sv    | eq_ref | PRIMARY,post_processed_idx             | PRIMARY              | 8       | clabs.pv.siteVisitId |     1 |   10
0.00 | Using where              |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+--------------------------+ 2 rows in set, 1 warning (0.00 sec)

mysql> explain extended select pv.id, pv.siteSKU, pv.hibernateVersion from product_views pv cross join site_visits sv where pv.siteVisitId=sv.id and pv.siteSKU= 'foo' and sv.siteId='bar' and sv.postProcessed=1 and pv.timestamp>='2011-05-19 00:00:00' and pv.timestamp<'2011-06-18 00:00:00';
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ | id | select_type | table | type   | possible_keys                          | key                  | key_len | ref | rows  | filt ered | Extra       |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ |  1 | SIMPLE      | pv    | ref    | siteVisitId,FK52C29B1E3CAB9CC4,FIND_UNPROCESSED_IDX | FIND_UNPROCESSED_IDX | 258     | const                | 41810 |   10
0.00 | Using where | |  1 | SIMPLE      | sv    | eq_ref | PRIMARY,post_processed_idx             | PRIMARY              | 8       | clabs.pv.siteVisitId |     1 |   10
0.00 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------+----------------------+---------+----------------------+-------+-----
-----+-------------+ 2 rows in set, 1 warning (0.00 sec)

UPDATE1: Splitting into 2 queries brings total time down to ~30s range

Not sure why, but splitting the latter query into the following reduces lat. from 65s to ~30s:

1) SELECT pv.id .... //from, where clauses same as above

2) SELECT * FROM product_views where id in (idList); //idList

UPDATE2: TABLE SIZE

table has on the order of 10M rows
query returns about 3k rows

Different columns in the SELECT can really impact retrieval. I wouldn't expect such an issue with BIGINT -- it usually binary (BLOB, etc) that really highlight the fact. Time to execute can be dependent on load on the system, and network between depending on how you are testing. Why do you not want a query cache? — OMG Ponies, Jun 19 '11 at 16:17
am interested in 1st query execution perf because of app use case (table content will almost always change between queries -> query cache won't be hit anyway) — Nikita, Jun 19 '11 at 16:25
There is no reason to have such huge table. Pease read about [3NF](http://en.wikipedia.org/wiki/Third_normal_form). — tereško, Jun 19 '11 at 16:35
@teresko the table looks normalized to me. how would you reduce it in size? — Nikita, Jun 19 '11 at 16:43
it seems that FIND_UNPROCESSED_IDX is not the best index to use, it scans 40000+ rows. try forsing other indexes to see if they will have bettwe performance. — Maxim Krizhanovsky, Jun 19 '11 at 17:36
This post is highly related: https://stackoverflow.com/questions/609343/what-are-covering-indexes-and-covered-queries-in-sql-server — colossatr0n, Aug 20 '20 at 22:14

score 4 · Answer 1 · edited Aug 21 '20 at 07:37

4

When you select only indexed columns, MySQL does read only the index, and does not need to read the table data. This, as far as I remember, is called index-covered query. However, when there are columns, that are not present in the used index, MySQL needs to open the table and read the data from it. This is the reason index-covered queries to be much faster.

See Using Covering Indexes to Improve Query Performance.

As for the improvement, how many rows are in the table, how much the query returns and what is your buffer pool size, how much RAM is available, etc.?

edited Aug 21 '20 at 07:37

colossatr0n

2,215
1
12
18

answered Jun 19 '11 at 16:15

Maxim Krizhanovsky

26,265
5
59
89

+1 This is only true for InnoDB though. See here: http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/ – Johan Jun 19 '11 at 16:29

score 2 · Answer 2 · edited Jun 19 '11 at 16:35

2

From what I have read about show profile, 'sending data' is a portion of execution process, and has almost nothing to do with sending actual data to the client. You can take a look on this thread
Also, mysql docs says about "Sending data" :

The thread is reading and processing rows for a SELECT statement, and sending data to the client. Because operations occurring during this state tend to perform large amounts of disk access (reads), it is often the longest-running state over the lifetime of a given query.

In my opinion, mysql would better not mix together "reading and processing rows for a SELECT statement" and "sending data" in one state, especially in state called "sending" data" which causes lots of confusion.

edited Jun 19 '11 at 16:35

rid

61,078
31
152
193

answered Jun 19 '11 at 16:05

a1ex07

36,826
12
90
103

OK, you're suggesting the 'executing' phase is lengthy? If so, the question still stands though - why? – Nikita Jun 19 '11 at 16:12
Check what `EXPLAIN` shows. I think your first query don't have to do a lookup for a column (columns) which are not part of indexes, but scans/seeks indexes. – a1ex07 Jun 19 '11 at 16:17
there is no difference in output of 'explain extended' for the two queries – Nikita Jun 19 '11 at 16:24
Can you add explain output for both queries to your post? – a1ex07 Jun 19 '11 at 16:32
1

@Nikita : The output you provided shows an important difference : `Using where; Using index` for the first query and just `Using where` in second. `Using index` means that "The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row" (from mysql). In the second case, it has to do a lookup to retrieve the value of non-indexed column. As it was already mentioned, you have a covered index for the first query, and non-covered index in the second. – a1ex07 Jun 19 '11 at 20:49
@a1ex07 excellent pt - I should not have missed that. still quite surprising that retrieving 3k rows - even from a 15m table - with known PKs drives query time from 3s to 64s. – Nikita Jun 20 '11 at 18:46

score 0 · Answer 3 · answered Jun 19 '11 at 17:29

I'm don't know MySQL internals at all, but Darhazer's explanation looks like the winner to me. When the non-indexed field is added, the entire row must be retrieved. And your rows are very wide. I can't quite tell from the names how (if at all) it is denormalized, but I suspect it is. site name and site sku smell like they belong in a site lookup table with an FK. rebates found amount and rebates found count sound like statistics that should be coming from a join to a separate product rebate table. etc.

SELECTing non-indexed column increases 'sending data' 25x - why and how to improve?

3 Answers3