How to improve "select min(my_col)" query in MySQL without adding and index

Question

The query below takes about a minute to run on my MySQL instance (running on a fairly beefy machine with 64G memory, 2T disc, 2.30Ghz CPU with 8 cores and 16 logical, and the query is running on localhost). This same query runs in less than a second on a SQL Server database I have access to. Unfortunately, I do not have access to the SQL Server host or the DBA, etc.

select min(visit_start_date)
from visit_occurrence;

The table has been set to ENGINE=MyISAM and default-storage-engine=INNODB and innodb_buffer_pool_size=16G are set in my.ini.

Is there some configuration I could be missing that would cause this query to run so slowly on MySQL? How can I fix it?

I have a large number of tables and queries I will need to support so I would really like to be able to fix this issue globally rather than having to create indexes everywhere I have slow queries.

The SQL Server database does not seem to have an index on the column being queried as shown below.

EDIT:

Untagged MS Sql Server, I had tagged it hoping for the help of our MS Sql Server colleagues with information that Sql Server had some way of structuring data and/or queries that would make this type of query run faster on that platform v other such as MySql
Removed image of code to more closely conform with community standards
You never know if there is a magic go-faster button if you don't ask (ENGINE=MyISAM is sometimes kind of like a magic go-faster button for some queries in MySql). I'm kind of fishing for a potential hardware or clustering solution here. Is Apache Ignite a potential solution here?

Thanks again to the community for all of your support and help. I hope this fixes most of the issues that have been raised for this post.

SECOND EDIT: Is the partitioning/sharding described in the links below a potential solution here?

https://user3141592.medium.com/how-to-scale-mysql-42ebd2841fa6

https://dev.mysql.com/doc/refman/8.0/en/partitioning-overview.html

THIRD EDIT: A note on community standards.

Part of our community standards is explicitly to be welcoming, inclusive, and to be nice.

https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/?fbclid=IwAR1gr6r2qmXs506SAV3H_h6H8LoFy3mlXucfa-fqiiEXMHUR3aF_tdoZGsw

https://meta.stackexchange.com/questions/240839/the-new-new-be-nice-policy-code-of-conduct-updated-with-your-feedback).

The MS Sql Server tag was used here as one of the systems I'm comparing is MS Sql Server. We're really working with very limited information here. I have two systems: My MySql system, which is knowable as I'm running it, and the MS Sql Server running the same database in someone else's system that I have very little information about (all I have is a read only sql prompt). I am comparing apples and oranges: The same query runs well on the orange (MS Sql Server) and does not run well on the apple (My MySql instance). I'd like to know why so I can make an informed decision about how to get my queries to run in a reasonable amount of time. How do I get my apple to look like an orange? Do I switch to MS Sql Server? Do I need to deploy on different hardware? Is the other system running some kind of in memory caching system on top of their database instance? Most of these possibilities would require a non trivial amount of time to explore and validate. So yes, I would like help from MS Sql Server experts that might know if there are caching options, transactional v warehouse options, etc. that could be set that would make a world of difference, that would be magic go-fast buttons.

The magic go-fast button comment was perhaps a little bit condescending.

The picture showing the indexes was shown as I was just trying to make the point that the other system does not seem to have an index on the column being queried. I this case a picture was worth a thousand words.

As per the question guide, please do not post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. — Dale K, Feb 08 '22 at 20:48
"I would really like to be able to fix this issue globally rather than having to create indexes everywhere I have slow queries" - actually this is part of database design, and a key way to ensure performance. There isn't a magical go-faster setting to make all queries run faster. — Dale K, Feb 08 '22 at 20:50
Those images look like SSMS but you talk about MySQL; these are conflicting things. — Thom A, Feb 08 '22 at 21:00
Yes, part of the question is considering if this could possibly be a difference between MySql and MS Sql Server — John, Feb 08 '22 at 21:06
MySql and SQL Server are radically different products. Add the execution plan for the SQL Server query using [PasteThePlan](https://www.brentozar.com/pastetheplan) and the Explain output for the same query in MySql. — Stu, Feb 08 '22 at 21:07
@John the question itself is not about SQL Server... thats background information. And actually completely irrelevant because the query engines of each product have nothing in common. You want to improve your performance on MySQL, therefore you question is about MySQL and you need to provide the relevant details about your database and query. Adding details about the same query in SQL Server adds nothing. Tagging SQL Server just annoys people who are trying to answer SQL Server questions. — Dale K, Feb 08 '22 at 21:09
Not sure if it will be faster or not... would have to test... but maybe `SELECT visit_start_date FROM visit_occurrence ORDER BY visit_start_Date asc LIMIT 1` There's more than 1 way to do something and I don't know your envionrment so optimizer may do something different.... but I suspect the order by will be just as expensive. — xQbert, Feb 08 '22 at 21:42
@xQbert - Think of it this way... Either formulation would have to check all the rows to find either the "min" or the "first". So even if they are executed differently, the effort (which is dominated by fetching and looking at _every_ `visit_start_Date`) is about the same. With the obvious index (`INDEX(date)`), it is then up to the Optimizer to understand that both "min" and "first" can be _very_ quickly resolved by looking at the "first" "row" in the INDEX BTree. — Rick James, Feb 08 '22 at 21:48
See also https://use-the-index-luke.com/sql/partial-results/top-n-queries, you need an index to make this efficient — Charlieface, Feb 08 '22 at 22:03
@Charlieface - My version of [_Groupwise-Max_](http://mysql.rjweb.org/doc.php/groupwise_max) -- aimed at optimization for large tables. — Rick James, Feb 08 '22 at 22:29
MariaDB's "Columnstore" Engine can very quickly find min, max, avg, etc, for _every_ column. — Rick James, Feb 08 '22 at 22:33
[_SlowLog_](http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog) — Rick James, Feb 08 '22 at 22:35
@RickJames I get it. I just know different optimizers can do different things and with engines like Athena and users processing data off of flat files instead of tables... what is done has to be tested as things work differently. I know this is mySQL so it shouldn't be that way in this case... — xQbert, Feb 08 '22 at 22:36

Rick James · Answer 1 · 2022-02-08T22:26:38.293

3

If the table says ENGINE=MyISAM, then that is what counts. In almost all cases, this is a bad choice. innodb_buffer_pool_size=16G is not relevant except that it robs memory from MyISAM.

default-storage-engine=INNODB is relevant only when creating a table explicitly specifying the ENGINE=.

Are some of your tables MyISAM and some are InnoDB? How much RAM do you have?

Most performance solutions necessarily involve an INDEX. Please explain why you can't afford an index. It could turn that query into less than 10ms, regardless of the number of rows in the table.

Sorry, but I don't accept "rather than having to create indexes everywhere I have slow queries".

Changing tables from MyISAM to InnoDB will, in some cases help with performance. Suggest you change the engine as you add the indexes.

Show us some more queries, we can help you decide what indexes are needed. select min(visit_start_date) from visit_occurrence; needs INDEX(date); other queries may not be so trivial. Do not fall into the trap of "indexing every column".

More

In MySQL...

A single connection only uses one core, so more cores only helps when you have more connections. (Some tiny exceptions exist in MySQL 8.0.)
Partitioning rarely helps with performance; do use that without getting advice. (PS: BY RANGE is perhaps the only useful variant.)
Replication is for read-scaling (and backup and ...)
Sharding is for write-scaling. It requires a bunch of extra architectural things -- such as routing queries to the appropriate servers. (MariaDB has Spider and FederatedX as possible tools.) In any case, sharding is a non-trivial undertaking.
Clustering is for HA (High Availability, auto-failover, etc), while helping some with read and write scaling. Cf: Galera, InnoDB Cluster.
Hardware is rarely more than a temporary solution to performance issues.
Caching leads to potentially inconsistent results, so beware. Also, consider my mantra "don't bother putting a cache in front of a cache".

(I can advise further on any of these topics.)

edited Feb 08 '22 at 22:26

answered Feb 08 '22 at 21:35

Rick James

135,179
13
127
222

Thanks Rick! It't not that I can't afford an index, its that I have lots of tables and lots of queries I'd like to support and its going to be a lot of work to design and implement all of those indexes. It seems to work on the other system without having to create these indexes so I'm trying to figure out what I can do to avoid all of the work of having to design and create the indexes. – John Feb 08 '22 at 21:44
I agree with your comment about not falling into the trap of "indexing all queries" :) – John Feb 08 '22 at 21:45
1

"its going to be a lot of work to design and implement all of those indexes". That's part of what you have to do when working with database apps. – Andy Lester Feb 08 '22 at 21:47
1

Just keep in mind. Without an index many queries will have to scan the entire table, and god help you if you want to sort. There is no way around this (or there is, and it's called an index) – davolfman Feb 08 '22 at 21:50
1

@AndyLester - MySQL has no auto-index-generator; apparently, SQL Server does. Sorry. I enjoy specifying optimal indexes; feed me the Selects. Or use my [_Index Cookbook_](http://mysql.rjweb.org/doc.php/index_cookbook_mysql) – Rick James Feb 08 '22 at 21:50
1

@AndyLester - Oops, I meant to say "indexing every column". – Rick James Feb 08 '22 at 21:51
@RickJames You do not want to use index every column. If you literally indexed every column, it would make your DB very slow when updating. – Andy Lester Feb 08 '22 at 22:02
1

@AndyLester - True. "But I indexed every column" is a common statement from novices. It slows down insert/update/delete. Furthermore, _most_ of the indexes may never be used. And, a composite index is often much better than a single-column index. That makes index creation more intellectually challenging. – Rick James Feb 08 '22 at 22:11
@RickJames Sorry, I got mixed-up, I thought I was replying to OP, not you. – Andy Lester Feb 08 '22 at 22:21
@AndyLester - They'll see it. Anyway, I needed to fix my typo. – Rick James Feb 08 '22 at 22:28

score 0 · Answer 2 · answered Feb 10 '22 at 14:56

Whether in MyISAM or InnoDB. or even SQL Server, your query

select min(visit_start_date) from visit_occurrence;

can be satisfied almost instantaneously by this index, because it uses a so-called loose index scan.

CREATE INDEX visit_start_date ON visit_occurrence (visit_start_date);

A query with an aggregate function like MIN() is always a GROUP BY query. But if the GROUP BY clause isn't present in the SQL statement, the server groups by the entire table.

You mentioned a query that can be satisfied immediately when using MyISAM. That's SELECT COUNT(*) FROM whatever_table. Behind the scenes MyISAM keeps table metadata showing the total number of rows in the table, so that query comes back right away. The transactional storage engine InnoDB doesn't do that. It supports so much concurrency that its designers didn't include the total row count in their metadata, because it would be wrong in so many circumstances that it wasn't worth the risk.

Index design isn't a black art. But it is an art informed by the kind of measurements we get from EXPLAIN (or ANALYZE or EXPLAIN ANALYZE). A basic truth of database-driven apps (in any make of database server) is that indexing needs to be revisited as the app grows. The good news: changing, adding, or dropping indexes doesn't change your data.

How to improve "select min(my_col)" query in MySQL without adding and index

2 Answers2