0

I'm trying to figure out how i can improve the performance of this query and i believe it may be my indexes; some of my thoughts are that date may be causing the poor performance or that i have the indexs in the order wrong. Also are there any other suggestions anyone has on how to improve the speed that are not index related? Thanks, i look forward to any input!

Here's what I've tried so far

 ALTER TABLE data ADD INDEX(data_timestamp, first,last);
 ALTER table data add index(first);
 ALTER table data add index(first);
 ALTER TABLE data add index (data_timestamp);

The following query(the second one) below runs a subquery for each row of a database in order to get the previous average at the instant of each point

select count(*) from data where data_timestamp like '2015-01-01%'; -> 362855

select (select sum(first*last) / sum(last)
FROM data t2
WHERE data_timestamp like '2015-12-18%'
AND t2.data_timestamp <= t1.data_timestamp
), t1.*
FROM data t1
WHERE data_timestamp like '2015-12-18%';
kevinn2065
  • 325
  • 1
  • 3
  • 12

3 Answers3

2

For optimum performance, you want an index range scan operation for the data_timestamp column. The predicate in the query of the form:

  WHERE data_timestamp LIKE '2015-12-18%' 

is forcing MySQL to evaluate EVERY value of data_timestamp in the table, effectively converting the datetime/timestamp value into a string, and then performing a string comparison on the converted value.

If we use a predicate with a comparison to datetime values, then MySQL can make more effective use of an index that has data_timestamp as a leading column. For example:

  WHERE data_timestamp >= '2015-12-18'
    AND data_timestamp <  '2015-12-18' + INTERVAL 1 DAY

The EXPLAIN output for a query using the LIKE pattern will show

type
------
index

That shows the query can make use of an index. But it's doing a full scan of the index, looking at every row in the index. But a much more efficient pattern is available. We can allow MySQL to quickly eliminate vast swaths of rows in the index from being considered, by using a range scan operation. A query with a predicate as in the second example will (should) show:

type
------
range

That's going to improve performance for a query that is pulling a relatively small number of rows from a large set.


More explanation, in case I didn't make this clear. Writing:

  WHERE ts_col LIKE '2015-12-18%' 

is effectively the same as writing

  WHERE CONVERT(ts_col,CHAR(18)) LIKE '2015-12-18%' 

And that forces MySQL to perform the CONVERT operation on the value in the ts_col for every row in the table.

BOTTOM LINE

Don't force unnecessary datatype conversions of columns from the table. Instead, compare columns to their native datatypes.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • Excellent response, it was ignoring the index, this is the correct solution(Using the between statement) Query time went from 26 hours to .3 seconds – kevinn2065 Dec 19 '15 at 23:27
  • And the only useful index (for this query, after changing the `WHEREs`) is `INDEX(data_timestamp)`. – Rick James Dec 19 '15 at 23:54
  • @Rick James: OP query can make effective use of *any* index that has `data_timestamp` as the leading column. The correlated subquery could make use of an index `on (data_timestamp,first,last)`. That index would be a covering index for the subquery. The subquery could be satisfied entirely from the index, without lookups to pages in the underlying table. With that composite index created, an index on just `data_timestamp` would be redundant. – spencer7593 Dec 20 '15 at 22:36
0

You can do an "explain" query to the database to figure out what is happening. Just write "explain" before any query.

Is not necessary to add indexes for first and last. You're only searching in data_timestamp fields so that's the only one index you need.

On the other hand, you might be having problems regarding to using "like %" in date columns. Check if there are another alternatives to do the same. If data_timestamp is a text column, you should add a full text index to the field. If data_timestamp is a date column, use "between" instead "like". "explain" tells you which index is used by the query.

Polak
  • 656
  • 13
  • 22
0

For this query you need only ALTER TABLE tic_data add index (data_timestamp). But data_timestamp must be of CHAR or VARCHAR type (you scan it with LIKE <string%>).

i486
  • 6,491
  • 4
  • 24
  • 41