1

I've had problems lately with a few queries on my database, this is due to the fact that one of my database tables has expanded from 50,000 rows to 1,000,000 rows to 2,000,000.

the table books has 2,000,000 records of books and has all kinds of information ranging from the ranking of the book to size of the book.

I tried running a simple query...

SELECT isbn FROM books WHERE rank < 100000

After waiting for 10 minutes I had to terminate the query as it just kept stalling, I do not have this problem when querying other tables so I assume the problem is with the size of the table.

I have started reading up indexing and so I would like to know if this is the best way to remedy my situation and if not what would be?

mk_89
  • 2,692
  • 7
  • 44
  • 62
  • Did you run this query from code, or from a SQL client? If you ran it from code it may not be the query, but may rather be the rendering code as Aheho's answer mentions. – Jeff Mar 14 '13 at 17:19

2 Answers2

2

Index the rank column and watch it fly

http://dev.mysql.com/doc/refman/5.0/en/create-index.html

How do I add indices to MySQL tables?

Community
  • 1
  • 1
YXD
  • 31,741
  • 15
  • 75
  • 115
  • OK thats interesting, from the 2nd link you posted im assuming indexing the rank column would be just like making the isbn column a primary key – mk_89 Mar 14 '13 at 17:04
  • If you want to filter by the value of `rank` quickly (in logarithmic time), you want to know the order of the rows sorted by `rank`. That's what the index effectively gives you. Adding an index to a different column won't help achieve this. If there is no index on the column, MySQL will have to search through every single entry to see if `rank < 100000` – YXD Mar 14 '13 at 17:09
  • OK, I can see some improvements, am I right in assuming that you should not over index a table and carefully choose which columns to index? also the ranks of books are constantly changing doesn't that mean I need to re-index the whole table? – mk_89 Mar 14 '13 at 17:18
  • The index will be maintained automatically when things change – YXD Mar 14 '13 at 17:21
  • Would read these answers: [[1](http://stackoverflow.com/questions/107132/what-columns-generally-make-good-indexes), [2](http://stackoverflow.com/questions/79241/how-do-you-know-what-a-good-index-is), [3](http://stackoverflow.com/questions/4093788/how-do-i-know-when-to-index-a-column-and-with-what)] for more information – YXD Mar 14 '13 at 17:23
1

Adding an index on the rank column would be the first thing I'd try.

However I would say that a table with 2 millions rows really isn't considered that big, and I find it odd that the database couldn't do a tablescan of 2 million rows in less than 10 minutes given the fact that you aren't even joining any other tables in the query.

How many rows do you expect will be returned? Are you positive that the delay is due to the database server, and not somewhere else in the chain? Like rendering a 200,000 row html table on a web client?

Aheho
  • 12,622
  • 13
  • 54
  • 83
  • I expected around 500k rows to be returned, I have not restarted the server for a few weeks now which could probably help but im running a big job at the moment updating the rank column of all 2,000,000 records – mk_89 Mar 14 '13 at 17:22
  • If you're expecting 25% of the rows to match the query condition, I'm afraid you are not going to see a big bump in speed by adding an index on Rank. You may see some, but your query is still probably going to take minutes to return. – Aheho Mar 14 '13 at 17:32