3

Let's say I have a table with millions of rows in which I have 3 integral variables: x,y and z against which I do my searching in a SELECT... WHERE x=a and y=b and z=c

Which would be faster / more efficient?

  • combining the 3 fields into a separate string column "x_y_z" (e.g. 1231_3242_6864) and indexing it

  • Making a 3 column index against the three integers?

paullb
  • 4,293
  • 6
  • 37
  • 65
  • what is the maximum value of each integer? – Sebas Jun 25 '12 at 15:29
  • For the purpose of argument here, say 50 000 each – paullb Jun 25 '12 at 15:34
  • 2
    In terms of storage size, three `UNSIGNED SMALLINT` fields would require 6 bytes total, while a string like `12345_67890_24680` could require 17 bytes. Same principle then applies to RAM usage for the indexes. Some aspects can be tweaked, but in my quick test with 3 mil. rows, the one-string table is 130 MB data / 170 MB index, while the three-Int table is 97 MB data / 76 MB index. – Wiseguy Jun 25 '12 at 19:47
  • This is a no brainer: 3 columns and composite index. – Imre L Jun 25 '12 at 20:25

4 Answers4

2

No it would be worst, string comparaison are much slower. You could eventually (if really needed, I wouldn't recommend it) combine the 3 integers in one integer but only IF THEY FIT.

However, to solve your index, problem the easiest would be to create a composite index on x,y and z.

mb14
  • 22,276
  • 7
  • 60
  • 102
  • you can't combine the integers into one big column. imagine the case (x = 1, y = 3, z = 5) and ( x = 5, y = 1, z = 3), they both add up the same value but mean something very different. – EkoostikMartin Jun 25 '12 at 15:42
  • no you don't add them up, but for example with x, y, z = 3, 5, 7, result would be 3000005000007. That's why I asked the size of integers – Sebas Jun 25 '12 at 15:56
  • @Ekoo of course you don't add them but you do as Sebas said (or shift them in binary) but that depends of the maximum of size of each. Which is what I mean by "if they fit" – mb14 Jun 25 '12 at 16:00
  • I see, so if you can guarantee the concatenation would fit into a "bigint" then why would you recommend against it? Surely a one column index performs better than a 3 column composite index? – EkoostikMartin Jun 25 '12 at 17:51
  • 1
    @Ekoo one column index is probably better than 3 one, but a composite of 3 integer is probably good and less hassle. – mb14 Jun 25 '12 at 18:03
  • 1
    Very true, I'd guess that in 95%+ of cases, the 3 int composite would be plenty sufficient. – EkoostikMartin Jun 25 '12 at 18:13
1

If you could have covering indexes, and all three numbers are always provided so there is no concern about the order in the index (note this is also an issue in the stringified version), I would use a composite index of the three ints.

The three ints will take up less space, making more rows fit per page, which generally makes the index more efficient when being read. You also have a lot more headroom in that integer compared to a five bytes string for 99999 (four bytes for +/- ~2^31).

The magnitude of efficiency is going to be hard to judge in the database query, but remember you are also going to have to assemble and populate these. Not sure how or where you are planning to do that in MySQL - SQL Server has persisted computed columns which might be a good design choice if you were to commit to the string version.

Certainly one wouldn't want to convert integers into strings and concatenate them on the fly before performing a join.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
0

I think that you would really need to benchmark it yourself (there are surely other factors that will determine the performance of your specific queries on your specific data), but in addition to the idea of using a composite 3 column index proposed by @mb14. You could try this:

Create a new column on your table like this:

xyzcomposite BINARY(16)

Then, create your index on this column.

On inserts, you would need to do an additional step to concat to your string "x_y_z" and then insert like this:

INSERT INTO yourtable (...,xyzcomposite) VALUES (...,UNHEX(MD5('the_xyz_concat')));

Of course you would also need to run the hash algorithm before your select statement as well.

set @xyz = UNHEX(MD5('x_y_z'));
select * from yourtable where xyzcomposite = @xyz

I'm not sure if the sum total of this overhead is worth the performance gain you may see in having to index only this one column vs. a three column index. Like I said you would have to benchmark it against your table and your data.

EDIT: the advantage of this approach is that it would work for x, y and z numbers of any size.

EkoostikMartin
  • 6,831
  • 2
  • 33
  • 62
0

There are a few other questions that are relevant to the discussion at hand:

Indexing Performance BigInt vs VarChar

Is there a REAL performance difference between INT and VARCHAR primary keys?

When should I use a composite index?

From the question about a composite index, one of the answers seems to indicate that if your x, y, and z tend to be unique among themselves (i.e. most x values are different from each other, and so on), you won't get much of a performance boost anyway. But if they do, the composite index seems to be the way to go.

Community
  • 1
  • 1
Andrew C
  • 689
  • 2
  • 9
  • 23
  • There is no issue of overlapping or not values, when the query is `WHERE x=a AND y=b AND z=c` The index will be efficient. That answer is about a query with spatial search (two range conditions), not 3 equality conditions. – ypercubeᵀᴹ Jun 25 '12 at 20:14
  • Overlapping of values may not be the precise phrase I'm looking for, perhaps uniqueness of x, y, and z is more accurate. Basically, the idea is that the index only helps if more than one x = a, and so on. – Andrew C Jun 25 '12 at 20:16
  • Of course, if all the rows in the table have x=a and y=b and z=c, the index won't be of much help :) This is the "selectivity" of the index (regarding a query parameters). And it is related to if the index is helpful and if it's used. But it has nothing to do with your linked answer. – ypercubeᵀᴹ Jun 25 '12 at 20:17