2

I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.

I have a query like below

SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0@v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1@c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal` 

This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.

To do this kind of work, which MySQL storage you recommend?

Dongle
  • 602
  • 1
  • 8
  • 18
  • 2
    Which indexes are defined on the table? What is the execution plan? –  Apr 27 '14 at 10:09
  • @a_horse_with_no_name: Hi, I will have indexes as defined in below SO user "Thilows" 's answer. – Dongle Apr 27 '14 at 10:13
  • Is this a flat table (does it have relations with other tables)? If its flat, then the fastest way would be to store it in a cache. If you will only execute one type of query, and your data set is _very large_, then you need to consider an alternative solution with map/reduce; like couchdb which would be a lot faster. – Burhan Khalid Apr 27 '14 at 10:13
  • 1
    If you don't ever update that table, why not store the result of the aggregation in a new table after you inserted everything into the `key_word` table? –  Apr 27 '14 at 10:14
  • Also check out this: [MyISAM versus InnoDB](http://stackoverflow.com/questions/20148/myisam-versus-innodb) – vitro Apr 27 '14 at 10:14
  • @a_horse_with_no_name Could you elaborate on the aggregation idea? I'm not sure how a "pre aggregation" would work when the condition is on hashed_word and the grouping is on indexVal. Changing the list of hashed words would affect the values per indexVal, so I can't see a _great_ way to pre calculate. Do you mean some sort of pre calculated hashed_value count per indexVal so you could use `sum` to aggregate instead of count? – Joachim Isaksson Apr 27 '14 at 10:24

2 Answers2

0

More than the storage engine, having the proper index in place seems important.

In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.

And if the data truly never changes, you could even pre-compute and cache some of those results.

For example

 CREATE TABLE counts  AS SELECT index_val, hashed_word, count(index_val) 
 FROM key_word
 GROUP BY index_val, hashed_word 
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 1
    Wouldn't be `hashed_word, index_val` better? (Although I'm not sure MySQL can use an index for an `OR` condition like that) –  Apr 27 '14 at 10:12
  • @a_horse_with_no_name: Inside my query, I already use `OR` condition inside the `IN` clause right? Sorry I am not good with SQL. – Dongle Apr 27 '14 at 10:14
  • @a_horse_with_no_name: Maybe. I'm also worried if MySQL can handle the OR, so I put index_val first. Only one way to tell: measure both. – Thilo Apr 27 '14 at 10:14
  • @a_horse_with_no_name: The index I have is `ALTER TABLE key_word ADD INDEX (hashed_word, indexVal)` – Dongle Apr 27 '14 at 10:15
  • @Thilo: Yes, the data never change. if do, we can rebuild the tables. – Dongle Apr 27 '14 at 10:16
  • 1
    @Dongle: Try both indexes, and get execution plans. See which one works better. – Thilo Apr 27 '14 at 10:16
  • @Thilo: `hashed_word, index_val` is better, but it takes 5 seconds. that is slow. – Dongle Apr 27 '14 at 10:17
  • Try pre-computed aggregates then. – Thilo Apr 27 '14 at 10:23
  • @Thilo: Your new table generated results only with 1 in "count" column. – Dongle Apr 27 '14 at 13:01
0

For SELECT-only queries, ARCHIVE is the fastest storage engine. As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.

Maxim Krizhanovsky
  • 26,265
  • 5
  • 59
  • 89