1

I need to do a pretty big search once each day/week/month and having a problem with the speeed. I have been increasing the buffer sizes to use 50% of the ram (2gb) and some other tweaks but none have given any performance increase.

SELECT SUM(MessageCount) AS ErrorCount, Country, StatusCode, StatusText
              FROM messages
              WHERE StatusCode IN(3,4,5,6,8,9,10,11,13) AND SentOn >= 1394275285391 AND SentON < 1396863685391
              GROUP BY Country, StatusCode

Takes around 50-70sec to process when running the query in mysql workshop and the table contains about 6,500,000 rows. To speed it up I added a index for StatusCode, SentOn and Country ( = CHAR(2)) and called it ErrorCountries seen below. I am not really happy about the speed in the workshop but I guess I could live with it if I had to.

I did an explain and got this result:

id=1    
select_typ=SIMPLE
table=messages  type=range
possible_keys=SentOn,ErrorCountries
key=ErrorCountries
key_len=12
rows=290977
filter=100.00
extra=Using index condition; Using MRR; Using temporary; Using filesort

I guess temporary and filesort might be a problem with the speed.

So while I said that the query takes 50-70sec in the workshop, it does not perform like this in my program. I did not have any command timeout in the connection string so I thought that adding it would solve this. But when I now have changed the timeout to 5min I started to wounder where the problem really is...

var sql = "SELECT SUM(MessageCount) AS ErrorCount, Country, StatusCode, StatusText" +
              " FROM messages" +
              " WHERE StatusCode IN(3,4,5,6,8,9,10,11,13) AND SentOn >= " + @from + " AND SentOn < " + to +
              " GROUP BY Country, StatusCode";

var res = db.Database.SqlQuery<Result>(sql).ToList();

Why is this query taking a minute to run in the workshop? Why does it take over 5min to run in the program? I have the program importing to the db at both time. And what can I do to make it faster?

user1842278
  • 1,039
  • 1
  • 12
  • 25

1 Answers1

3

You might try doing three things to improve your speed.

First, you mentioned you created an index on (StatusCode, SentOn, Country). You're on the right track, but you have the order of columns wrong in this index.

This query does a range scan on SentOn, then a value selection on StatusCode, then grabs MessageCount, Country, and StatusText detail fields. The first column in your column index should be the one on which you do a range scan. Try creating an index on (SentOn, StatusCode, MessageCount, Country, StatusText). That should help accelerate your query. Read about covering indexes if you want to know why.

Second, it seems there's a one-to-one relationship between the values of StatusCode and StatusText. I'm guessing this because you aren't grouping by StatusText.

If that is true, try doing your summary query only on StatusCode, using this little subquery.

SELECT SUM(MessageCount) AS ErrorCount, 
       Country, 
       StatusCode
  FROM messages
 WHERE StatusCode IN (3,4,5,6,8,9,10,11,13) 
   AND SentOn >= 1394275285391 
   AND SentON < 1396863685391
 GROUP BY Country, StatusCode

This will be a bit faster, because you won't have to haul around the StatusText values as you do the data shuffling (filesort) stuff required to satisfy your GROUP BY.

You need another subquery to relate StatusCode to StatusText. This might perform horribly; I suggest you try it before you proceed.

SELECT DISTINCT StatusCode, StatusText
  FROM messages
 WHERE StatusCode IN (3,4,5,6,8,9,10,11,13)

You may be able to boost the performance of this subquery by creating a compound index on (StatusCode, StatusText).

Then try joining those two subqueries together.

SELECT a.ErrorCount, a.Country, a.StatusCode, b.StatusText
  FROM (
        SELECT SUM(MessageCount) AS ErrorCount, 
               Country, 
               StatusCode
          FROM messages
         WHERE StatusCode IN (3,4,5,6,8,9,10,11,13) 
           AND SentOn >= 1394275285391 
           AND SentOn < 1396863685391
         GROUP BY Country, StatusCode
       ) AS a
  JOIN (
        SELECT DISTINCT StatusCode, StatusText
          FROM messages
         WHERE StatusCode IN (3,4,5,6,8,9,10,11,13)
       ) AS b ON a.StatusCode=b.StatusCode

Third, if you can drop the StatusText column from your messages table altogether and make yourself a little lookup table to find the StatusText values based on StatusCode, that will likely be a giant win. But that will require a change to your application.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks for the great answer! I did two tests, first rearranging the indexes which dropped the query time to 28sec. When removing the status text It dropped to 2sec, amazing! – user1842278 Apr 07 '14 at 13:39