0

I have a rather large table (1,8GB, 25M records), with indexes on every important column (those I use WHERE or other selectors with). The engine is MyISAM, I've heard people recommend switching to InnoDB but many others said MyISAM is faster with large amounts of data in terms of counting (and InnoDB better for safe transactions).

Now, sometimes, I do a set of simple stats queries (so I see where we stand):

(A) SELECT COUNT(username) FROM superLargeTable
(B) SELECT COUNT(username) FROM superLargeTable WHERE firstname IS NOT NULL
(C) SELECT COUNT(username) FROM superLargeTable WHERE firstname IS NULL

If the table has been updated recently (and the caches are invalid), these are the query times measured by the slow query log:

A: 71 seconds (Lock:0s) (Rows:1) B: 47 seconds (Lock:0s) (Rows:1) C: 20 seconds (Lock:0s) (Rows:1)

A second run, obviously (since it's cached then) results in almost instant data.

I have indexes on all the relevant columns (pretty much on almost all of them). I still juggle with recommendations saying that indexes on many columns is very bad design, on the other hand, I totally need it since the queries would all time out otherwise.

I do have quite a few burst-type inserts and updates, firing at the table on irregular intervals (sometimes every minute, with up to 5000 inserts/updates fast-burst), sometimes with a few minutes in between.

The data needs to be in sync, so I can't split up into a read-only slave and just work with that.

Any idea why a simple COUNT takes so long? Is there a secret alternative command that tells me how many records are in a row, with a single, certain criteria?

Really puzzled. Server is an 8GHz, 8/16GB RAM, 160GB disk (10% in use or so) and the queries are all run locally and not transferred over the web.

Willing to help with more data/information if required.

Edit 1: EXPLAIN command for query A returns this:

select_type=SIMPLE
type=index
possible_keys=NULL
key=followers (this is the actual username column in the imaginary query above)
key_len=9
ref=NULL
rows=24408162
Extra=Using index

Edit 2: @theHe recommended a great tool: mysqltuner.pl – here are the results (the issues):

[!!] Total fragmented tables: 15
[!!] Maximum possible memory usage: 30.7G (767% of installed RAM)
[!!] Query cache efficiency: 8.6% (5M cached / 59M selects)
[!!] Joins performed without indexes: 19031
[!!] Temporary tables created on disk: 48% (993K on disk / 2M total)
[!!] Table cache hit rate: 0% (64 open / 27K opened)

And these are the recommendations:

-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
Reduce your overall MySQL memory footprint for system stability
Adjust your join queries to always utilize indexes
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries without LIMIT clauses
Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
*** MySQL's maximum memory usage is dangerously high ***
*** Add RAM before increasing MySQL buffer variables ***
query_cache_limit (> 10M, or use smaller result sets)
join_buffer_size (> 128.0K, or always use indexes with joins)
tmp_table_size (> 32M)
max_heap_table_size (> 16M)
table_cache (> 64)

James Stone
  • 487
  • 7
  • 16
  • 1
    check your query with EXPLAIN and maybe append the result? – TheHe Aug 23 '12 at 09:00
  • "*I have indexes on all the relevant columns*" -- but MySQL will only use one index. Do you have a suitable compound index, containing each column on which your query performs a lookup in a sensible order? – eggyal Aug 23 '12 at 09:03
  • @thehe I am checking with EXPLAIN in a second, will report back. – James Stone Aug 23 '12 at 09:05
  • Also, `COUNT(column)` is slower than `COUNT(*)` (it has to scan `column` so that `NULL` values are excluded). Do you need to do that? If so, do you have an index on `username`? – eggyal Aug 23 '12 at 09:05
  • @eggyal this is the current index design, i was under the impression mysql would pick whichever index it needs to query? http://i.imgur.com/TBprf.png – James Stone Aug 23 '12 at 09:05
  • @eggyal the COUNT(*) being faster is a good one, I always (for whatever reason) thought it'd be the other way around. I'll modify the code to include this. not sure by how far this will improve it... – James Stone Aug 23 '12 at 09:06
  • COUNT(1) is the only real solution guys ;) [when you don't use grouping] – TheHe Aug 23 '12 at 09:06
  • No, MySQL will try to pick the best index, but it can only pick one of them. If you don't have an optimal index, it will resort to using a suboptimal one. – eggyal Aug 23 '12 at 09:06
  • "*[`COUNT(*)` is optimized to return very quickly if the `SELECT` retrieves from one table, no other columns are retrieved, and there is no `WHERE` clause.](http://dev.mysql.com/doc/en/group-by-functions.html#function_count)*" – eggyal Aug 23 '12 at 09:07
  • @eggyal think we're both correct --> http://stackoverflow.com/questions/1221559/count-vs-count1 – TheHe Aug 23 '12 at 09:08
  • @TheHe: That's for SQL Server, not MySQL. – eggyal Aug 23 '12 at 09:09
  • @eggyal: so for query A, the COUNT(*) would indeed help since I have no WHERE and no JOINs ... and I could get rid of query B or C by subtracting from result of query A ... – James Stone Aug 23 '12 at 09:09
  • 1
    @RomanMittermayr: If you're not interested in when `username` is `NULL`, I'd suggest doing `COUNT(*)` for query A, `COUNT(firstname)` for query B (no `WHERE` clause) and just subtract the two for query C. – eggyal Aug 23 '12 at 09:12
  • i also just posted the results of the query A with EXPLAIN – James Stone Aug 23 '12 at 09:12
  • server with default settings? check with www.mysqltuner.pl [simple wget and execute] – TheHe Aug 23 '12 at 09:14
  • maybe cache-sizes for index etc. are totally not okay... explain explains, keys are used... soooooo.... check it with tuner. – TheHe Aug 23 '12 at 09:14
  • this is my my.cnf http://pastebin.com/6nnKrsqT – James Stone Aug 23 '12 at 09:14
  • THAT TOOL IS AWESOME. wtf why did no one ever mention this before. Some of the issues it reported: Total fragmented tables: 15, Maximum possible memory usage: 30.7G (767% of installed RAM), Query cache efficiency: 8.6% (5M cached / 59M selects), Joins performed without indexes: 19031, Temporary tables created on disk: 48% (993K on disk / 2M total), Table cache hit rate: 0% (64 open / 27K opened) – James Stone Aug 23 '12 at 09:22
  • how do I give both of you the karma you deserve? eggyal had a great suggestion with count(*) instead of specific column and thehe enlighted me with a ridiculously helpful tool ... – James Stone Aug 23 '12 at 09:24
  • 1
    As a note: the remark that myisam is faster than innodb no longer is valid. Its more situational now, you should try it both ways. Both read and write operations vary, but most of the time, innodb seems faster in all scenarios. – Tuncay Göncüoğlu Sep 18 '14 at 07:01

0 Answers0