5

We have a table with about 25,000,000 rows called 'events' having the following schema:

TABLE events
- campaign_id   : int(10)
- city      : varchar(60)
- country_code  : varchar(2)

The following query takes VERY long (> 2000 seconds):

SELECT COUNT(*) AS counted_events, country_code
FROM events
WHERE campaign_id` in (597) 
GROUPY BY city, country_code
ORDER BY counted_events

We found out that it's because of the GROUP BY part.

There is already an index idx_campaign_id_city_country_code on (campaign_id, city, country_code) which is used.

Maybe someone can suggest a good solution to speed it up?

Update:

'Explain' shows that out of many possible index MySql uses this one: 'idx_campaign_id_city_country_code', for rows it shows: '471304' and for 'Extra' it shows: 'Using where; Using temporary; Using filesort' –

Here is the whole result of EXPLAIN:

  • id: '1'
  • select_type: 'SIMPLE'
  • table: 'events'
  • type: 'ref'
  • possible_keys: 'index_campaign,idx_campaignid_paid,idx_city_country_code,idx_city_country_code_campaign_id,idx_cid,idx_campaign_id_city_country_code'
  • key: 'idx_campaign_id_city_country_code'
  • key_len: '4'
  • ref: 'const'
  • rows: '471304'
  • Extra: 'Using where; Using temporary; Using filesort'

UPDATE:

Ok, I think it has been solved:

Looking at the pasted query here again I realized that I forget to mention here that there was one more column in the SELECT called 'country_name'. So the query was very slow then (including country_name), but I'll just leave it out and now the performance of the query is absolutely ok. Sorry for that mistake!

So thank you for all your helpful comments, I'll upvote all the good answers! There were some really helpful additions, that I probably also we apply (like changing types etc).

  • what does explain SELECT COUNT(*) AS counted_events, country_code FROM events WHERE campaign_id` in (597) GROUPY BY city, country_code ORDER BY counted_events gives? – sathia May 12 '15 at 08:44
  • 'Explain' shows that out of many possible index MySql uses this one: 'idx_campaign_id_city_country_code', for rows it shows: '471304' and for 'Extra' it shows: 'Using where; Using temporary; Using filesort' – user3767671 May 12 '15 at 08:48
  • The evil here is `ORDER BY counted_events` which is causing `Using temporary; Using filesort' ` – Abhik Chakraborty May 12 '15 at 08:55
  • There are too many rows to sort. I think the problem come from database design – Truong Hua May 12 '15 at 08:56
  • @TruongHua there's one table, so it's not like there's a database design. he just needs to fix types and indexes. this query will run fine once optimised – sathia May 12 '15 at 09:03
  • please format your output and also add the actual schema for the table. it's hard to help otherwise – sathia May 12 '15 at 09:07
  • In general, if an unaggregated column exists in a SELECT clause, then we would expect to find it in the GROUP BY clause (*and vice versa*). This isn't an absolute requirement in MySQL, but you should have good and clear reasons for its omission. – Strawberry May 12 '15 at 09:34
  • Also (a minor point) country_code is VARCHAR(2), which implies that more than half of your country codes are single letter. This is unlikely (and if using ISO 3166-1, in fact impossible). CHAR(2) would be a better DATA TYPE – Strawberry May 12 '15 at 09:38
  • That `CHAR(2)` should be `CHARACTER SET ascii`. If it is utf8, it will unnecessarily take 6 bytes. – Rick James May 12 '15 at 15:22

4 Answers4

3

without seeing what EXPLAIN says it's a long distance shot, anyway:

  1. make an index on (city,country_code)
  2. see if there's a way to use partitioning, your table is getting rather huge
  3. if country code is always 2 chars change it to char
  4. change numeric indexes to unsigned int

post entire EXPLAIN output

sathia
  • 2,192
  • 2
  • 24
  • 42
  • 1
    please format your output and also add the actual schema for the table. it's hard to help otherwise – sathia May 12 '15 at 09:24
  • cool, don't forget to change types, it is very important. I also asked to see the actual show create table result because for example, there's no need to use utf-8 charset on country_code, a simple latin_general_ci charset will be perfect for your needs and it will save a byte for each character :) – sathia May 12 '15 at 10:19
0

don't use IN() - better use:

WHERE campaign_id = 597
OR campaign_id = 231
OR ....

afaik IN() is very slow.

update: like nik0lias commented - IN() is faster than concatenating OR conditions.

low_rents
  • 4,481
  • 3
  • 27
  • 55
  • Then you'd be wrong. Theres lots of evidence that IN is actually quicker. This is just one question..http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance – nik0lai May 12 '15 at 08:48
  • For `25,000,000` rows multiple `or` does not work even with indexed columns. – Abhik Chakraborty May 12 '15 at 08:48
  • @northkildonan It sounded more harsh than I intended! – nik0lai May 12 '15 at 08:53
  • @nik0lias nvm, i thought i ran over this exact problem - but i did not double check - so my bad. i somehow confused it with some PHP array search behaviour. – low_rents May 12 '15 at 08:54
0

Some ideas:

  • Given the nature and size of the table it would be a great candidate for partitioned tables by country. This way the events of every country would be stored in a different physical table even if it behaves as a virtual big table

  • Is country code an string? May be you have a country_id that could be easier to sort. (It may force you to create or change indexes)

  • Are you really using the city in the group by?

borjab
  • 11,149
  • 6
  • 71
  • 98
0
  • partitioning - especially by country will not help
  • column IN (const-list) is not slow, it is in fact a case with special optimization

The problem is, that MySQL doesn't use the index for sorting. I cannot say why, because it should. Could be a bug.

The best strategy to execute this query is to scan that sub-tree of the index where event_id=597. Since the index is then sorted by city_id, country_code no extra sorting is needed and rows can be counted while scanning.

So the indexes are already optimal for this query. MySQL is just not using them correctly.


I'm getting more information off line. It seems this is not a database problem at all, but

  1. the schema is not normalized. The table contains not only country_code, but also country_name (this should be in an extra table).
  2. the real query contains country_name in the select list. But since that column is not indexed, MySQL cannot use an index scan.

As soon as country_name is dropped from the select list, the query reverts to an index-only scan ("using index" in EXPLAIN output) and is blazingly fast.

XL_
  • 679
  • 5
  • 3