6

I have a table that will have millions of entries, and a column that has BIGINT(20) values that are unique to each row. They are not the primary key, but during certain operations, there are thousands of SELECTs using this column in the WHERE clause.

Q: Would adding an index to this column help when the amount of entries grows to the millions? I know it would for a text value, but I'm unfamiliar with what an index would do for INT or BIGINT.

A sample SELECT that would happen thousands of times is similar to this:

`SELECT * FROM table1 WHERE my_big_number=19287319283784
Drew
  • 24,851
  • 10
  • 43
  • 78
Indigenuity
  • 9,332
  • 6
  • 39
  • 68

3 Answers3

7

If you have a very large table, then searching against values that aren't indexed can be extremely slow. In MySQL terms this kind of query ends up being a "table scan" which is a way of saying it must test against each row in the table sequentially. This is obviously not the best way to do it.

Adding an index will help with read speeds, but the price you pay is slightly slower write speeds. There's always a trade-off when making an optimization, but in your case the reduction in read time would be immense while the increase in write time would be marginal.

Keep in mind that adding an index to a large table can take a considerable amount of time so do test this against production data before applying it to your production system. The table will likely be locked for the duration of the ALTER TABLE statement.

As always, use EXPLAIN on your queries to determine their execution strategy. In your case it'd be something like:

EXPLAIN SELECT * FROM table1 WHERE my_big_number=19287319283784
tadman
  • 208,517
  • 23
  • 234
  • 262
  • I'm not familiar with EXAMINE, it doesn't work when I try it in Workbench, does EXPLAIN do what you're intending with EXAMINE? – Indigenuity Nov 13 '12 at 20:09
  • 1
    `EXPLAIN` might work better because it's actually the right command. Thanks for pointing that out. – tadman Nov 13 '12 at 21:58
1

It will improve your look up (SELECT) performance (based on your example queries), but it will also make your inserts/updates slower. Your DB size will also increase. You need to look at how often you make these SELECT calls vs. INSERT calls. If you make a lot of SELECT calls, then this should improve your overall performance.

Oleksi
  • 12,947
  • 4
  • 56
  • 80
  • Would it be a huge detriment to inserts/updates? This is not just a heavy-read table, but a heavy-write table – Indigenuity Nov 13 '12 at 20:01
  • The best way would be to implement it and do some profiling to see the impact. Typically the read performance increase is greater than the write performance hit, but I don't know that it's a great idea for a heavy-write table. – Oleksi Nov 13 '12 at 20:06
1

I have a 22 million row table on amazon ec2 small instance. So it is not the fastest server environment by a long shot. I have this create:

CREATE TABLE huge
(
    myid int not null AUTO_INCREMENT PRIMARY KEY,
    version int not null,
    mykey char(40) not null,
    myvalue char(40) not null,
    productid int not null
);

CREATE INDEX prod_ver_index ON huge(productid,version);

This call runs finishes instantly:

select * from huge where productid=3333 and version=1988210878;

As for inserts, I can do 100/sec in PHP, but if i cram 1000 inserts into an array use implode on this same table I get get 3400 inserts per second. Naturally your data is not coming in that way. Just saying the server is relatively snappy. But as tadman suggests, and he meant to say EXPLAIN not examine, in front of a typical statement to see if the key column is showing an index that will be used were you to run it.

General Comments

For slow query debugging, place the word EXPLAIN in front of the word select (no matter how complicated the select/join may be), and run it. Though the query will not be run in normal fashion with resolving the resultset, the db engine will produce (almost immediately) an execution plan it would attempt. This plan may be abandoned when the real query is run (the one prior to putting EXPLAIN in front of it), but it is a major clue-in to schema shortcomings.

The output of EXPLAIN appears cryptic for those first reading one. Not for long though. After reading a few articles about it, such as Using EXPLAIN to Write Better MySQL Queries, one will usually be able to determine which sections of the query are using which indexes, using none and doing slow tablescans, slower where clauses, derived and temp tables.

Using the output of EXPLAIN sized up against your schema, you can gain insight into strategies for index creation (such as composite and covering indexes) to gain substantial query performance.

Sharing

Sharing this EXPLAIN output and schema output with others (such as in stackoverflow questions) hastens better answers concerning performance. Schema output is rendered with such statements as show create table myTableName. Thank you for sharing.

Drew
  • 24,851
  • 10
  • 43
  • 78