0

I have a table/schema with two columns named day of DateTime and user_id of Integer. Right know I made both columns indexed.

Is performance improvements gained from indexing worth it, considering huge fraction of additional space used by the index and there are only two columns? How do you justify them?

How does this differ if I use MongoDB or MySQL?

Community
  • 1
  • 1
Khairul
  • 1,483
  • 1
  • 13
  • 23
  • 1
    Justification comes from querying. This really boils down to the fundamentals of how indexing works and why it makes queries faster, a little point of further reading: http://stackoverflow.com/questions/1108/how-does-database-indexing-work the difference between MySQL and MongoDB is in how they structure indexes and other internal stuff (and space) most of which you will not notice on the outside. – Sammaye Oct 25 '12 at 14:25

2 Answers2

2

If there are few rows, you might not see great improvements with indexes. If there are many rows, you probably will see great improvements.

The good thing is that you don't have to guess, and you don't have to agonize over what few and many mean in practice. Every modern SQL dbms includes some way to measure SELECT statement performance. That includes MySQL.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
1

Is performance improvements gained from indexing worth it

Depends on the queries you intend to run.

  • If you have something like: WHERE day = ..., then you'll need an index whose leading edge contains day. If properly used, indexes can speed-up querying many orders of magnitude, especially on large data sets.
  • OTOH, every additional index costs space/cache and INSERT/UPDATE/DELETE performance.

At the end of the day, I recommend you measure on realistic amounts of data and come to your own conclusions.

BTW, If you are using InnoDB, then your table is clustered (see also: Understanding InnoDB clustered indexes) and the whole table is effectively stored in the primary index. The secondary indexes in clustered tables contain copy of the PK fields, which (I'm assuming) is user_id in this case. And since we only have two fields in the table, the secondary index on { day } will cover the user_id as well, avoiding a double-lookup that could otherwise happen in a clustered table. Effectively, you'll end-up with two separate (but synchronized) B-Trees and an index-only scan no matter which one of them you access (which is good). Of course, you could explicitly make a composite index on {day, user_id} instead of just { day }, for a very similar effect.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167