Self join association has_many and belongs_to (one)

Question

I would like to make a self join association which has two relationships

has_many and belongs_to (one)

I've found 2 ways to do this but I am not sure which one is more accurate, for example in this answer has_many and belongs_to within same model there is a relation between employee and manager, manager has_many employees, and employee belongs_to manager

the solution was to add a field called manager_id, so if the user is a manager then the manager_id will be null, if an employee then manager_id will be the id of the manager.

The associations look like this :

has_many :employees, class_name: "User", foreign_key: :manager_id
belongs_to :manager, class_name: "User", foreign_key: :manager_id

The SECOND solution which I found here Rails Associations - has_many => :through - but same model has the following relation : post has_many related_posts and related_post belongs_to post

The solution here was to create two separate tables, one for posts and one for related_posts like this :

create_table "posts", :force => true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "related_posts", :force => true do |t|
    t.integer  "post_id"
    t.integer  "related_post_id"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

and the associations look like this :

class Post < ActiveRecord::Base
  has_many :related_posts_association, :class_name => "RelatedPost"
  has_many :related_posts, :through => :related_posts_association, :source => :related_post
  has_many :inverse_related_posts_association, :class_name => "RelatedPost", :foreign_key => "related_post_id"
  has_many :inverse_related_posts, :through => :inverse_related_posts_association, :source => :post
end

class RelatedPost < ActiveRecord::Base
  belongs_to :post
  belongs_to :related_post, :class_name => "Post"
end

Now I am not sure which way to go, the first solution looks simple and the idea was to add an additional field called manager_id (or can be post_id if we are working with posts) which will be null for all the managers people, I imagine having a lot of records with null value on that field which seems not correct...

the second solution looks good but the wrong thing I feel about it is the relation has_many :inverse_related_posts_association which seems like a related_post has also other related posts / or maybe a related_post belongs to many posts (I am not sure).

In my case I want something like post has_many related_posts and a related_post belongs_to (one) post

Baradzed · Accepted Answer · 2016-08-28T10:43:51.250

1

I'd go with the first option, unless you have a very compelling reason to take second approach. It is simple and obvious. And you can always refactor your code later.

There's nothing particularly bad about having nulls in a column, as long as the column is relevant for all records. In your case, NULL in manager_id would mean 'this person does not have a manager'. Is NULL value relevant here? My answer is yes.

NULL means the value is unknown or no value. Rails statement @person.manager.present? will properly return true for people who have manager defined, and false for those who don't have a manager. When looking at a database record for any particular person, NULL in the manager_id field will convey the same meaning.

edited Aug 28 '16 at 10:43

answered Aug 28 '16 at 08:34

Baradzed

578
3
10

in database modeling the first option is the obvious and correct way to go, but think about millions of records with null value in that field, I heard that having a lot of nulls is a bad thing ! what do you mean by relevant for all records ? – medBouzid Aug 28 '16 at 08:44
I expanded the relevance part in the answer, and will address the rest of your questions in comments. – Baradzed Aug 28 '16 at 10:45
From architectural point of view, if you have many nullable columns in your table, and the records would be sparsely filled (_each record_ expected to have many NULLs), it _may_ be a sign of bad database design (first thing to consider would be some more normalization). But this does not sound to be your case. – Baradzed Aug 28 '16 at 10:46
"think about millions of records with null value in that field" — is your app reasonably expected to store info about _MILLIONS_ of managers? There are no companies of this size that I know of. And what is the expected ratio of managers to employees in this case? Side note: most RDBMS systems deal with millions of records just fine on moderate hardware. – Baradzed Aug 28 '16 at 11:00
the manager-employee was just an example to explain the issue, my case is more similar to post - related posts, and yes you can expect millions of posts if you have some kind of social network for example where people post things... my worry was just about having millions of rows with a null value in that field – medBouzid Aug 28 '16 at 11:24
I am also thinking about "fast request" and maybe it's better to create a field that store an array of "related posts", because when someone create a post, he/she can also specify related posts manually, so by storing them in the same field as an array I will avoid the join operation! what do you think? – medBouzid Aug 28 '16 at 11:41
Modern RDBMS'es do joins very effectively, avoiding a single join usually does not impact response time. Just remember that such an optimization may very well be premature optimization, and you may find that your time would be better spent optimizing other parts of your system/doing some smart caching or just implementing an important feature. In this case you can easily create a test database filled with a needed number of simulated posts and cross-references, run few requests and see how quick the response time is. – Baradzed Aug 28 '16 at 13:35
For "related posts" feature you will probably need to fetch not only ids, but also author, post title and/or few initial sentences, am I right? By avoiding a join in this scenario you can end up with worse performance. – Baradzed Aug 28 '16 at 13:44
I need to fetch only one related post randomly from the related_posts, what I am doing is : 1 - getting the post ( Post.find(params[:id]). 2- check if that post has related posts. 3- pick a one random post from related posts and show it. and NOO the related post is exclusive to the main post. no many to many here – medBouzid Aug 28 '16 at 13:49
For this scenario, fetching a random record from related posts would be something like `post.related_posts.limit(1).offset(random_int).first`. But, you would need to count related posts first (to calculate the random_int). It is trivial to request this number from the database, and would not have much of an impact on performance. Time would be mainly spent on round trip to the database. If I were concerned with round trip time, I would store number of related posts in Post. – Baradzed Aug 28 '16 at 14:06
wow I like your request, imagine I have one post with id=3 and 5 related posts, the total of post + related posts = 6 right. if I want to fetch randomly one of the 6 posts (the original post combined with the 5 related) the request will be like this : `Post.where(id: 3).or(Post.where(original_post_id: 3)).limit(1).offset(5).first` is that correct ? – medBouzid Aug 28 '16 at 14:22
but even that is one request I still need to hit the database the first time to get the count number :( – medBouzid Aug 28 '16 at 14:33
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/122038/discussion-between-medbo-and-baradzed). – medBouzid Aug 28 '16 at 14:42

Self join association has_many and belongs_to (one)

1 Answers1