127

I'm trying to do something that I thought it would be simple but it seems not to be.

I have a project model that has many vacancies.

class Project < ActiveRecord::Base

  has_many :vacancies, :dependent => :destroy

end

I want to get all the projects that have at least 1 vacancy. I tried something like this:

Project.joins(:vacancies).where('count(vacancies) > 0')

but it says

SQLite3::SQLException: no such column: vacancies: SELECT "projects".* FROM "projects" INNER JOIN "vacancies" ON "vacancies"."project_id" = "projects"."id" WHERE ("projects"."deleted_at" IS NULL) AND (count(vacancies) > 0).

jphorta
  • 2,144
  • 3
  • 18
  • 27

11 Answers11

212

1) To get Projects with at least 1 vacancy:

Project.joins(:vacancies).group('projects.id')

2) To get Projects with more than 1 vacancy:

Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')

3) Or, if Vacancy model sets counter cache:

belongs_to :project, counter_cache: true

then this will work, too:

Project.where('vacancies_count > ?', 1)

Inflection rule for vacancy may need to be specified manually?

Dorian
  • 22,759
  • 8
  • 120
  • 116
Arta
  • 5,127
  • 5
  • 25
  • 23
  • 3
    Should this not be `Project.joins(:vacancies).group('projects.id').having('count(vacancies.id) > 1')`? Querying the number of vacancies instead of the project ids – Keith Mattix Jul 30 '18 at 19:29
  • 4
    No, @KeithMattix, it **should** not be. It **can** be, however, if it reads better to you; it's a matter of preference. The count can be done with any field in the join table that is guaranteed to have a value in every row. Most meaningful candidates are `projects.id`, `project_id`, and `vacancies.id`. I chose to count `project_id` because it is the field on which the join is made; the spine of the join if you will. It also reminds me that this is a join table. – Arta Apr 05 '20 at 17:24
73

joins uses an inner join by default so using Project.joins(:vacancies) will in effect only return projects that have an associated vacancy.

UPDATE:

As pointed out by @mackskatz in the comment, without a group clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use

Project.joins(:vacancies).group('projects.id')

UPDATE:

As pointed out by @Tolsee, you can also use distinct.

Project.joins(:vacancies).distinct

As an example

[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""
jvnill
  • 29,479
  • 4
  • 83
  • 86
43

Yeah, vacancies is not a field in the join. I believe you want:

Project.joins(:vacancies).group("projects.id").having("count(vacancies.id)>0")
Andrew Nesbitt
  • 5,976
  • 1
  • 32
  • 36
Peter Alfvin
  • 28,599
  • 8
  • 68
  • 106
26
# None:
Project.left_joins(:vacancies).group('projects.id').having('count(vacancies) = 0')
# Any
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 0')
# One
Project.joins(:vacancies).group('projects.id').having('count(vacancies) = 1')
# More than 1
Project.joins(:vacancies).group('projects.id').having('count(vacancies) > 1')
Bruno Degomme
  • 883
  • 10
  • 11
Dorian
  • 22,759
  • 8
  • 120
  • 116
8

Performing an inner join to the has_many table combined with a group or uniq is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS with a correlated subquery.

This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.

That's not as straightforward in Rails, but can be achieved with:

Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)

Similarly, find all projects that have no vacancies:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)

Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists being delegated to arel. Fix this with:

Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)

Edit: if you're uncomfortable with raw SQL, try:

Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)

You can make this less messy by adding class methods to hide the use of arel_table, for example:

class Project
  def self.id_column
    arel_table[:id]
  end
end

... so ...

Project.where.not(
  Vacancies.where(
    Vacancy.project_id_column.eq(Project.id_column)
  ).arel.exists
)
David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • these two suggestions don't seem to work... the subquery `Vacancy.where("vacancies.project_id = projects.id").exists?` yields either `true` or `false`. `Project.where(true)` is an `ArgumentError`. – Les Nightingill Oct 18 '19 at 04:41
  • `Vacancy.where("vacancies.project_id = projects.id").exists?` is not going to execute – it will raise an error because the `projects` relation won't exist in the query (and there's no question mark in the sample code above either). So decomposing this into two expressions isn't valid and doesn't work. In recent Rails `Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)` raises a deprecation warning ... I'll update the question. – David Aldridge Oct 18 '19 at 11:17
5

In Rails 4+, you can also use includes or eager_load to get the same answer:

Project.includes(:vacancies).references(:vacancies).
        where.not(vacancies: {id: nil})

Project.eager_load(:vacancies).where.not(vacancies: {id: nil})
konyak
  • 10,818
  • 4
  • 59
  • 65
5

I think there's a simpler solution:

Project.joins(:vacancies).distinct
Yuri Karpovich
  • 382
  • 4
  • 10
4

Without much Rails magic, you can do:

Project.where('(SELECT COUNT(*) FROM vacancies WHERE vacancies.project_id = projects.id) > 0')

This type of conditions will work in all Rails versions as much of the work is done directly on the DB side. Plus, chaining .count method will work nicely too. I've been burned by queries like Project.joins(:vacancies) before. Of course, there are pros and cons as it's not DB agnostic.

konyak
  • 10,818
  • 4
  • 59
  • 65
  • 1
    This is much slower than join and group method, as the 'select count(*)..' subquery will execute for each projects. – YasirAzgar Mar 02 '18 at 05:51
  • @YasirAzgar The join and group method is slower than the "exists" method because it will still access all the child rows, even if there are a million of them. – David Aldridge Aug 06 '20 at 08:43
0

You can also use EXISTS with SELECT 1 rather than selecting all the columns from the vacancies table:

Project.where("EXISTS(SELECT 1 from vacancies where projects.id = vacancies.project_id)")
K M Rakibul Islam
  • 33,760
  • 12
  • 89
  • 110
0

If I want to know how many records have at least one of an associated record, I would do:

Project.joins(:vacancies).uniq.count
drjorgepolanco
  • 7,479
  • 5
  • 46
  • 47
-6

The error is telling you that vacancies is not a column in projects, basically.

This should work

Project.joins(:vacancies).where('COUNT(vacancies.project_id) > 0')
wkhatch
  • 2,664
  • 7
  • 36
  • 45