3

I have a table called data in mysql database. The table is quite large and has about 500k records and this number will grow up to 1 million. Each record consists of about 50 columns and most of them contain varchars.

The data table is being used very frequently. Actually, most queries access this table. The data is being read from and written to it by ~50 users simultaneously. The system is highly loaded with the users uploading and checking their data so it can be stopped maximum for an hour or two.

After some research. I found out that almost all the select queries that have 'where' clause use one of four fields in the table. That fields are: isActive, country, state, city - all are in the int format. The where can be either

    where isActive = {0|1}

or

    where isActive = {0|1} and {country|state|city} = {someIntValue}

or

    where {country|state|city} = {someIntValue}

And the last thing is that the table does not have any indexes except for the primary id one.

After the table has grown to current sizes i faced some performance issues.

So, my question is if i create the indexes on the columns isActive, country, state and city will the performance increase?

UPD: I've just created an index on one of that fields and WOW! the queries are being executed immediately. Thank you, guys.

kbeat
  • 149
  • 1
  • 12
  • 1
    Yes it will - just try it - you don't lose any data. You could use tinyint when just using values 0 or 1. – bestprogrammerintheworld Jun 06 '13 at 19:08
  • is it a complex single index for all that fields or separate ones for each field? as i've understood, creating index will cause table recreation and i don't want to risk. – kbeat Jun 06 '13 at 19:11
  • 1
    Table recreation? Creating or deleting an index will not affect any of the data, or the table itself. – PeteGO Jun 06 '13 at 19:15
  • To answer your question we need the current and target cardinality of each of the 4 columns. (total distinct / total rows, now and an estimation of what you think it will be when the table will have 1M records). – Sebas Jun 06 '13 at 19:26
  • Do you have queries which check all 3 (or 4) columns, like: `where (country = X and state = Y and city = Z)` ? – ypercubeᵀᴹ Jun 06 '13 at 19:39
  • No, i've eliminated all such queries on the language level – kbeat Jun 06 '13 at 19:40
  • BTW, creating index really causes the table recreation so i had about 10 insert and update request being locked while the index was being created – kbeat Jun 06 '13 at 19:41
  • With your current "solution" you are rolling a dice. I suggest you extract the data I asked for so we can provide a correct assessment. – Sebas Jun 06 '13 at 19:42
  • It would also help if you added the output of `SHOW CREATE TABLE tablename;` in the question. – ypercubeᵀᴹ Jun 06 '13 at 19:43

3 Answers3

2

I don't think it's a good idea to index the isActive field because it'll cause the indexing overhead when adding/updating/deleting, but it'll only split data in two chunks (1 and 0) when reading so it'll not really help.

Edit: found this to explain the point above: Is there any performance gain in indexing a boolean field?

For the other tree columns, I recommend you to do a benchmark when most user are offline (in the night, or lunch time) and see how it affect performance, but I think it'll really help without many downsides.

Edit: ypercube has signaled some interesting use cases where my answer about indexing boolean field isn't relevant, check comments.

Community
  • 1
  • 1
Atrakeur
  • 4,126
  • 3
  • 17
  • 22
  • 2
    It may be helpful to index the `isActive` as well. If there is imbalance and say, only 1% have `isActive=1`, then queries with this condition would use the index and probably be more efficient than with a full table scan. – ypercubeᵀᴹ Jun 06 '13 at 19:41
  • The index would also be useful for queries of the type `WHERE isActive=X ... LIMIT (Y)` where `X` can be either 0 or 1 and `Y` not huge. – ypercubeᵀᴹ Jun 06 '13 at 19:56
  • After all the data is loaded the field `isActive` will become unuseful due to the business logic. – kbeat Jun 06 '13 at 20:03
1

Yes creating an index on each of these columns will help you.

Consider and underline the word each. A separate index for each one is what I suggest. The reason being coexistence of different combinations of the columns.

RGV
  • 732
  • 3
  • 10
1

Yes, definately. you may see even better results if you include selected additional fields to each index too. Just take careful notice of the column order... But before all else, make sure you dont use myisam engine for a big table with many writes! Switch to innodb for example.

Mattias Åslund
  • 3,877
  • 2
  • 18
  • 17