122

I'm implementing functionality to track which articles a user has read.

  create_table "article", :force => true do |t|
    t.string   "title"
    t.text     "content"
  end

This is my migration so far:

create_table :user_views do |t|
  t.integer :user_id
  t.integer :article_id
end

The user_views table will always be queried to look for both columns, never only one. My question is how my index should look like. Is there a difference in the order of these tables, should there be some more options to it or whatever. My target DB is Postgres.

add_index(:user_views, [:article_id, :user_id])

Thanks.

UPDATE:
Because only one row containing the same values in both columns can exist (since in knowing if user_id HAS read article_id), should I consider the :unique option? If I'm not mistaken that means I don't have to do any checking on my own and simply make an insert every time a user visits an article.

anothermh
  • 9,815
  • 3
  • 33
  • 52
Emil Ahlbäck
  • 6,085
  • 8
  • 39
  • 54
  • "The user_views table will always be queried to look for both columns, never only one." -- there will never be a "find all articles that this user has viewed", or "find all users who have viewed this article" query? I find that surprising. – David Aldridge Aug 13 '15 at 13:30

2 Answers2

260

The order does matter in indexing.

  1. Put the most selective field first, i.e. the field that narrows down the number of rows fastest.
  2. The index will only be used insofar as you use its columns in sequence starting at the beginning. i.e. if you index on [:user_id, :article_id], you can perform a fast query on user_id or user_id AND article_id, but NOT on article_id.

Your migration add_index line should look something like this:

add_index :user_views, [:user_id, :article_id]

Question regarding 'unique' option

An easy way to do this in Rails is to use validates in your model with scoped uniqueness as follows (documentation):

validates :user, uniqueness: { scope: :article }
Nick Merrill
  • 1,242
  • 1
  • 14
  • 20
sscirrus
  • 55,407
  • 41
  • 135
  • 228
  • 10
    The order matters enormously in indexing. Place the where clauses to the left and complete the index with the ordering columns to the right. http://stackoverflow.com/questions/6098616/dos-and-donts-for-indexes – Denis de Bernardy May 29 '11 at 20:36
  • 2
    Note that `validates_uniqueness_of` (and its cousin, `validates uniqueness:`) are prone to race conditions – Ben Aubin Dec 04 '17 at 01:30
  • 1
    As mentioned in the comments above and https://stackoverflow.com/a/1449466/5157706 and https://stackoverflow.com/a/22816105/5157706, consider adding unique index on the database as well. – Akash Agarwal Jan 27 '18 at 06:38
  • "Put the most selective field first, i.e. the field that narrows down the number of rows fastest." This is a good heuristic. But I have a case where I have 3 fields that I can search on, but if using the second one then the first is always also provided, and similarly if using the third then the first 2 are always provided. Even though the third one technically narrows the answer the most, because it's only ever provided in the case that 2 other fields are used the index makes sense to be in the order of their prevalence of usage. – Cruncher Jan 26 '21 at 20:48
36

Just a warning about checking uniqueness at validation time vs. on index: the latter is done by database while the primer is done by the model. Since there might be several concurrent instances of a model running at the same time, the validation is subject to race conditions, which means it might fail to detect duplicates in some cases (eg. submit twice the same form at the exact same time).

olivier
  • 361
  • 3
  • 2
  • So which one is better? Database side or validates_uniqueness_of? – W.M. Jun 18 '16 at 15:58
  • 13
    Both. validates_uniqueness_of can be used to display an error message gracefully in the application for example when a form gets saved. Database constraint would make sure you don't end up with dup records even know you had validation specified in the model. Plus, you can rescue the ActiveRecord exception and also show a nice message to the user. – Uģis Ozols Aug 08 '16 at 08:36
  • 8
    @W.M. If you have to pick one, go with the database constraint. This will work even if different, non RoR applications interact with your data, and ensures consistency for the long term. – mooreds Oct 07 '16 at 19:53
  • 1
    Use both. validations (can) give users a better experience. constraints save the day. – David Hempy Dec 13 '22 at 15:08