70

I have a pretty simple HABTM set of models

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags

   def tags= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end 
end 

Now it all works alright except that I get a ton of duplicates in the Tags table.

What do I need to do to avoid duplicates (bases on name) in the tags table?

Sam Saffron
  • 128,308
  • 78
  • 326
  • 506

12 Answers12

73

Prevent duplicates in the view only (Lazy solution)

The following does not prevent writing duplicate relationships to the database, it only ensures find methods ignore duplicates.

In Rails 5:

has_and_belongs_to_many :tags, -> { distinct }

Note: Relation#uniq was depreciated in Rails 5 (commit)

In Rails 4

has_and_belongs_to_many :tags, -> { uniq }

Prevent duplicate data from being saved (best solution)

Option 1: Prevent duplicates from the controller:

post.tags << tag unless post.tags.include?(tag)

However, multiple users could attempt post.tags.include?(tag) at the same time, thus this is subject to race conditions. This is discussed here.

For robustness you can also add this to the Post model (post.rb)

def tag=(tag)
  tags << tag unless tags.include?(tag)
end

Option 2: Create a unique index

The most foolproof way of preventing duplicates is to have duplicate constraints at the database layer. This can be achieved by adding a unique index on the table itself.

rails g migration add_index_to_posts
# migration file
add_index :posts_tags, [:post_id, :tag_id], :unique => true
add_index :posts_tags, :tag_id

Once you have the unique index, attempting to add a duplicate record will raise an ActiveRecord::RecordNotUnique error. Handling this is out of the scope of this question. View this SO question.

rescue_from ActiveRecord::RecordNotUnique, :with => :some_method
Community
  • 1
  • 1
Jeremy Lynch
  • 6,780
  • 3
  • 52
  • 63
  • 1
    Best answer imho. You are doing something wrong in case you have dupes. Fix the problem with the unique index. – Hendrik Nov 14 '16 at 12:07
  • 1
    I wouldn't override any setters because such things most likely will give an extra pain in the ass to you or your peers. I'd create a separate method to add an associated object unless it was added before. In other words name `tag=` method something like `add_unique_tag` – VAD May 16 '18 at 10:53
25

In addition the suggestions above:

  1. add :uniq to the has_and_belongs_to_many association
  2. adding unique index on the join table

I would do an explicit check to determine if the relationship already exists. For instance:

post = Post.find(1)
tag = Tag.find(2)
post.tags << tag unless post.tags.include?(tag)
Roddy of the Frozen Peas
  • 14,380
  • 9
  • 49
  • 99
spyle
  • 1,960
  • 26
  • 23
  • 3
    Clean solution. In your `Post` model, add `def tag=(tag); tags << tag unless tags.include?(tag); end` for robustness. – scarver2 Sep 17 '13 at 06:09
  • 7
    Interestingly enough, [has_many association documentation](http://guides.rubyonrails.org/association_basics.html) (search for include?) recommends not using the `unless ...include?()` due to race conditions. I'm guessing the same could hold true for has_and_belongs_to_many. – spyle Jun 17 '14 at 20:54
  • Anyone got a confirmation for this? – Redoman Oct 29 '14 at 16:15
  • any confirmation regarding this?? – Alfie May 06 '16 at 21:43
  • @Alfie I've used this in multiple projects in production and never experienced a problem. I've not seen the issue mentioned in has_many docs anywhere else. That's not conclusive however, just my own experience. – spyle May 09 '16 at 23:24
  • Thanks, but i changed to has_many through, since i had other purposes with the join table. So added a model and placed validations there. – Alfie May 10 '16 at 12:46
  • 1
    @spyle If you add the recommended unique index on the joint table, then you will not get duplicates, but you should through an error in the (rare) race condition. – tomf Nov 15 '17 at 20:29
  • `include` is an Array method, it will load and loop over all the records in the relation, which not performant at all. Instead, use `tags.exists?(tag.id)` – Pere Joan Martorell Jul 07 '19 at 18:26
20

You can pass the :uniq option as described in the documentation. Also note that the :uniq options doesn't prevent the creation of duplicate relationships, it only ensures accessor/find methods will select them once.

If you want to prevent duplicates in the association table you should create an unique index and handle the exception. Also validates_uniqueness_of doesn't work as expected because you can fall into the case a second request is writing to the database between the time the first request checks for duplicates and writes into the database.

Simone Carletti
  • 173,507
  • 49
  • 363
  • 364
  • I tried that, it does not really solve the problem cleanly. I want to avoid exceptions in the first place. I think my solution kind of does that though it needs a bit of tweaking. – Sam Saffron Jul 15 '09 at 08:04
  • 2
    Note: I already had the unique constraint, which resulted in exceptions, handling them is a huge pain. – Sam Saffron Jul 15 '09 at 08:05
  • 1
    With the current version of will_paginate (3.0.3) having duplicates in the join table makes it impossible to have unique pagination. – John Jul 30 '12 at 23:08
20

In Rails4:

class Post < ActiveRecord::Base 
  has_and_belongs_to_many :tags, -> { uniq }

(beware, the -> { uniq } must be directly after the relation name, before other params)

Rails documentation

cyrilchampier
  • 2,207
  • 1
  • 25
  • 39
12

Set the uniq option:

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts , :uniq => true
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags , :uniq => true
Joshua Cheek
  • 30,436
  • 16
  • 74
  • 83
  • 4
    This doesn't work as expected. You still get duplicate constraint errors when trying to add them via <<. – Luca Spiller Dec 27 '10 at 22:14
  • I just tried (on Rails 2.3.5), and this is not true. You should make your own question where you supply sufficient information to diagnose your issue. – Joshua Cheek Dec 28 '10 at 11:05
  • 15
    I thought uniq ignores duplicates on reading but still allows you to write duplicates which would raise an db exception if you have duplicate constraint at the database layer – Tony Mar 24 '11 at 07:02
  • 3
    This won't prevent duplicates, it will just hide them from displaying – bluehallu Sep 04 '14 at 13:57
4

I worked around this by creating a before_save filter that fixes stuff up.

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags
   before_save :fix_tags

   def tag_list= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end  

    def fix_tags
      if self.tags.loaded?
        new_tags = [] 
        self.tags.each do |tag|
          if existing = Tag.find_by_name(tag.name) 
            new_tags << existing
          else 
            new_tags << tag
          end   
        end

        self.tags = new_tags 
      end
    end

end

It could be slightly optimised to work in batches with the tags, also it may need some slightly better transactional support.

Sam Saffron
  • 128,308
  • 78
  • 326
  • 506
  • This is exactly what the validates_uniqueness_of validator does. However, this solution alone doesn't prevent duplicate items because between your find_by_name and the association, an other request might write to the database. You can use this but you must be aware that some exceptions can occur (because you have an index) and you should catch them. – Simone Carletti Jul 15 '09 at 08:46
  • not really http://github.com/rails/rails/blob/b6bac73b282c7e500c43810f2a937fc0047e5979/activerecord/lib/active_record/validations.rb it is a validation that will exception out if there is a dupe, it does not handle it transparently, it has documented concurrency issues, and does need to be used with a unique index to guarantee no dupes. – Sam Saffron Jul 15 '09 at 10:40
  • Yes, do it like this way and also add a unique index in the database. Then later on you can build on this model and add new data to it in case you need to store more information on each association. – Hendrik Nov 14 '16 at 12:08
4

I would prefer to adjust the model and create the classes this way:

class Tag < ActiveRecord::Base 
   has_many :taggings
   has_many :posts, :through => :taggings
end 

class Post < ActiveRecord::Base 
   has_many :taggings
   has_many :tags, :through => :taggings
end

class Tagging < ActiveRecord::Base 
   belongs_to :tag
   belongs_to :post
end

Then I would wrap the creation in logic so that Tag models were reused if it existed already. I'd probably even put a unique constraint on the tag name to enforce it. That makes it more efficient to search either way since you can just use the indexes on the join table (to find all posts for a particular tag, and all tags for a particular post).

The only catch is that you can't allow renaming of tags since changing the tag name would affect all uses of that tag. Make the user delete the tag and create a new one instead.

Jeff Whitmire
  • 750
  • 1
  • 7
  • 12
3

To me work

  1. adding unique index on the join table
  2. override << method in the relation

    has_and_belongs_to_many :groups do
      def << (group)
        group -= self if group.respond_to?(:to_a)
        super group unless include?(group)
      end
    end
    
2

This is really old but I thought I'd share my way of doing this.

class Tag < ActiveRecord::Base 
    has_and_belongs_to_many :posts
end 

class Post < ActiveRecord::Base 
    has_and_belongs_to_many :tags
end

In the code where I need to add tags to a post, I do something like:

new_tag = Tag.find_by(name: 'cool')
post.tag_ids = (post.tag_ids + [new_tag.id]).uniq

This has the effect of automatically adding/removing tags as necessary or doing nothing if that's the case.

Javeed
  • 99
  • 3
  • 2
    Another way to express last line is `post.tag_ids |= [new_tag.id]` or `post.tags |= [new_tag]`, using [Array's set-like operators](http://www.virtuouscode.com/2012/08/27/array-set-operations-in-ruby/) – Beni Cherniavsky-Paskin Nov 19 '17 at 18:10
1

Extract the tag name for security. Check whether or not the tag exists in your tags table, then create it if it doesn't:

name = params[:tag][:name]
@new_tag = Tag.where(name: name).first_or_create

Then check whether it exists within this specific collection, and push it if it doesn't:

@taggable.tags << @new_tag unless @taggable.tags.exists?(@new_tag)
dav1dhunt
  • 179
  • 1
  • 10
0

You should add an index on the tag :name property and then use the find_or_create method in the Tags#create method

docs

ajbraus
  • 2,909
  • 3
  • 31
  • 45
0

Just add a check in your controller before adding the record. If it does, do nothing, if it doesn't, add a new one:

u = current_user
a = @article
if u.articles.exists?(a)

else
  u.articles << a
end

More: "4.4.1.14 collection.exists?(...)" http://edgeguides.rubyonrails.org/association_basics.html#scopes-for-has-and-belongs-to-many

Matthew Bennett
  • 303
  • 2
  • 12