0

I'm using MySQL with Django framework. (I have plan to change MySQL to PostgreSQL later)

Following is example case which abstracted real DB.


Column description

  1. ID(int(11) type) : auto-incremented value from 1

  2. Column A(int(10) type) : choice from 10 values

  3. Column B(longtext type): Text field


Example of data row

ID     |    Column A    |      Column B
               ...
21     |    301010101   |  TGGQtY84r033i0F6tpx3...
22     |    301010102   |  31TfNgzpxkcuMLxrrZ6D...
23     |    301010103   |  U069Z5kG354BwDriFw6d...
24     |    301010107   |  d4MSkCBxwZzKusALQAIQ...
25     |    301010105   |  R1SJCWeM62P1ikQwmG3f...
26     |    301010103   |  bVScBZbf0n1tkdgFCwmD...
27     |    301010102   |  4UpQGyCz5KhlolEdsO8M...
28     |    301010101   |  x89gOjNS4J4xiP1DfIWH...
29     |    301010110   |  STMlfUwx8afCZBsa8CWJ...
30     |    301010101   |  XctEBThnlA5MYTKqycLJ...
31     |    301010104   |  fRAEBMXDEdNFn5aENn4r...
31     |    301010105   |  GlIwVjVF16WE4zWnnSy8...
               ...
  1. Many rows (more than hundred thousands)

  2. Each row have Column A value among 10 values (301010101, .., 301010110)


Query usage

  1. Grouping by Column A first, then ordering by ID descendingly.

    For example, Models.object.filter(Column A = "301010101").order_by('-id')


Question

For above case, what column would be the best for index ?

  1. ID as index (Default)

  2. Column A as index

  3. Using ID and Column A together as index (Multi-index)


Edit : The result

I tested it with 500000 rows of random data. (with `ordering=['-id'] in Class Meta)

Then, I tested query Models.objects.filter(Column A = "301010101")

  1. ID as index (Default) : 0.33 sec

  2. Using ID and Column A together as index (Multi-index) : 0.12 sec

From the above test result, I convinced that using ID and Column A together as Multi-index is the most optimized case.

Community
  • 1
  • 1
Chemical Programmer
  • 4,352
  • 4
  • 37
  • 51
  • For your first condition a composed index on `(ColumnA,id)` would be helpful.Test it directly in mysql with EXPLAIN.But given the choices I would guess 2 – Mihai Jan 17 '15 at 17:57
  • I tested it by myself with 500000 random rows of data as you suggested. The result is, using `ID` and `Column A` together as index is the fastest case. – Chemical Programmer Jan 18 '15 at 05:55

2 Answers2

2

For that "specific" query, it appears that you'd have better performance on a dual-column index, as suggested, on (column a,id).

From the manual:

"If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer attempts to use the Index Merge optimization (see Section 8.3.1.4, “Index Merge Optimization”), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows"

Lance
  • 638
  • 1
  • 6
  • 22
  • wouldn't the index on (column a,id) only be used (increase speed) when filtering by 'column a' and 'id'? i.e. `Models.object.filter(Column A = "301010101", id=12)` – warath-coder Jan 17 '15 at 19:20
  • it is my understanding that the "id" column is being used in that query via the "order by" clause – Lance Jan 18 '15 at 02:56
  • Thanks for your answer. I have a one more question. Assume case that table is already ordered by `-id` (`ordering=['-id;]` in Class Meta). In this case we don't have to query `order_by('-id')`. Then, single index as `Column A` is okay ? – Chemical Programmer Jan 18 '15 at 04:55
  • that's a good question. As for "good practice", I would personally never "assume" anything with the database. If you are depending on the order of the table itself as the default "order by" when the desired "order by" is not included in the query would be an assumption, in this case. The following link has some good discussion on the assumption of order-by-default: http://stackoverflow.com/questions/8746519/sql-what-is-the-default-order-by-of-queries – Lance Jan 18 '15 at 17:24
0

if the ID is an auto_increment column (which i think it should be); then it is already Indexed.

It would help with performance if you indexed Column A as well

warath-coder
  • 2,087
  • 1
  • 17
  • 21