19

I am attempting to nest SELECT queries in Arel and/or Active Record in Rails 3 to generate the following SQL statement.

SELECT sorted.* FROM (SELECT * FROM points ORDER BY points.timestamp DESC) AS sorted GROUP BY sorted.client_id

An alias for the subquery can be created by doing

points = Table(:points)
sorted = points.order('timestamp DESC').alias

but then I'm stuck as how to pass it into the parent query (short of calling #to_sql, which sounds pretty ugly).

How do you use a SELECT statement as a sub-query in Arel (or Active Record) to accomplish the above? Maybe there's an altogether different way to accomplish this query that doesn't use nested queries?

Schrockwell
  • 838
  • 1
  • 8
  • 25

5 Answers5

25

Here's my approach to temporary tables and Arel. It uses Arel#from method passing in the inner query with Arel#to_sql.

inner_query = YourModel.where(:stuff => "foo")
outer_query = YourModel.scoped  # cheating, need an ActiveRelation
outer_query = outer_query.from(Arel.sql("(#{inner_query.to_sql}) as results")).
                          select("*")

Now you can do some nice things with the outer_query, paginate, select, group, etc...

inner_query ->

select * from your_models where stuff='foo'

outer_query ->

select * from (select * from your_models where stuff='foo') as results;
todd
  • 2,381
  • 1
  • 20
  • 11
  • 4
    You can also get an outer_query without having to specify a fake model or table name. The last two lines in the above can be replaced by this line, which is what "from" is calling anyways: outer_query = Arel::SelectManager.new(Arel::Table.engine, Arel.sql("(#{inner_query.to_sql}) as results")) – DSimon Mar 01 '12 at 21:31
8

The question is why would you need a "nested query"? We do not need to use "nested queries" this is thinking in the mindset of SQL not Relational Algebra. With relational algebra we derive relations and use the output of one relation as input to another so the following would hold true:

points = Table(:points, {:as => 'sorted'}) # rename in the options hash
final_points = points.order('timestamp DESC').group(:client_id, :timestamp).project(:client_id, :timestamp)

It's best if we leave the renaming to arel unless absolutely necessary.

Here the projection of client_id AND timestamp is VERY important since we cannot project all domains from the relation (i.e. sorted.*). You must specifically project all domains that will be used within the grouping operation for the relation. The reason being is there is no value for * that would be distinctly representative of a grouped client_id. For instance say you have the following table

client_id   |   score
----------------------
    4       |    27
    3       |    35
    2       |    22
    4       |    69

Here if you group you could not perform a projection on the score domain because the value could either be 27 or 69 but you could project a sum(score)

You may only project the domain attributes that have unique values to the group (which are usually aggregate functions like sum, max, min). With your query it would not matter if the points were sorted by timestamp because in the end they would be grouped by client_id. the timestamp order is irrelevant since there is no single timestamp that could represent a grouping.

Please let me know how I can help you with Arel. Also, I have been working on a learning series for people to use Arel at its core. The first of the series is at http://Innovative-Studios.com/#pilot I can tell you are starting to know how to since you used Table(:points) rather than the ActiveRecord model Point.

Snuggs
  • 802
  • 1
  • 5
  • 6
  • Thank you for the detailed response. "the timestamp order is irrelevant since there is no single timestamp that could represent a grouping." You're right; I see what you are saying. It appears MySQL works around this inconsistency by returning just the first row of the client_id group, which is what I was aiming for. I see now this is not behavior I should count on. My goal is to return the most recent point for all client_ids, i.e. a single point with the maximum timestamp per each client_id grouping. It is important to do in one query because it will be polled often. – Schrockwell May 25 '10 at 01:22
  • We would need to use some aggregate function. If we ask ourselves "What are we trying to do?" The answer would be to find the most recent or "maximum" date so we would pass max(timestamp) in sql. This would correspond to Arel::Attribute::Expression::Maximum which can be called with syntactic sugar on an Arel::Attribute like sorted[:timestamp].maximum() . There is one caveat. Make sure you add timestamp to group operation #group('client_id, timestamp') or the entire grouping scenario will error. I know the MAX aggregate function works on dates in Postgres and I'm sure in MySQL as well. – Snuggs May 25 '10 at 10:14
  • 2
    Firstly, sorting and order is not part of relational algebra. Arel defines it anyway. Secondly, whether or not subqueries are part of relational algebra is irrelevant. Conceptually, the result of a SELECT is not visible until the WHERE clause executes. Therefore not all databases (eg. Postgres) allow column aliases in WHERE clauses and instead depend on subqueries. If Arel cannot handle subqueries then names in the WHERE clause cannot be aliased. This can get messy when you can't depend on Arel to generate names. – Samuel Danielson Oct 06 '10 at 20:19
  • @SamuelDanielson As far as I recall, Postgres allows column aliases in WHERE clauses. So does every other SQL database that I'm aware of. – Marnen Laibow-Koser Sep 19 '12 at 16:31
  • @MarnenLaibow-Koser: MySQL allows this, but PostgreSQL absolutely does not permit using aliases defined in the SELECT clause to be used in the WHERE clause of the same query. They can be used in the WHERE clause of a wrapper query - which is why a subquery would be useful in such a scenario. Alternatively, a CTE can be used. Neither of these approaches are handled very well by Arel without some direct manipulation to that effect. – PinnyM May 02 '13 at 14:04
  • @PinnyM As far as I know, your description of Postgres's treatment of column aliases is incorrect. I'll run tests and report back if I'm wrong. – Marnen Laibow-Koser May 09 '13 at 23:55
  • 1
    @PinnyM Well, I learn something new every day! I thought I'd used aliases in `WHERE`, but apparently I'd only used them in `JOIN`. This led me to http://stackoverflow.com/a/942592/109011 , where I found that this seems to be an SQL limitation. Further testing confirms that only SQLite supports column aliases in `WHERE` clauses. However, your original example has a further problem: `sum` is an aggregate function, and aggregates aren't calculated at the time that the `WHERE` clause is run. So I put together http://www.sqlfiddle.com/#!12/86136/4 , which more clearly illustrates the point. – Marnen Laibow-Koser May 10 '13 at 15:45
7
Point.
 from(Point.order(Point.arel_table[:timestamp].desc).as("sorted")).
 select("sorted.*").
 group("sorted.client_id")
Subba Rao
  • 10,576
  • 7
  • 29
  • 31
7

Although I don't think this problem needs nested queries, like Snuggs mentioned. For those who do need nested queries. This is what I got working so far, not great but it works:

class App < ActiveRecord::Base   
  has_many :downloads

  def self.not_owned_by_users(user_ids)
    where(arel_table[:id].not_in( 
      Arel::SqlLiteral.new( Download.from_users(user_ids).select(:app_id).to_sql ) ) )
  end
end

class Download  < ActiveRecord::Base
  belongs_to :app
  belongs_to :user

  def self.from_users(user_ids)
    where( arel_table[:user_id].in user_ids )
  end

end

class User < ActiveRecord::Base
  has_many :downloads
end

App.not_owned_by_users([1,2,3]).to_sql #=>
# SELECT `apps`.* FROM `apps` 
# WHERE (`apps`.`id` NOT IN (
#   SELECT app_id FROM `downloads` WHERE (`downloads`.`user_id` IN (1, 2, 3))))
#
Jeroen van Dijk
  • 1,029
  • 10
  • 16
1

To do this in "pure" Arel, this worked for me:

points = Arel::Table.new('points')
sorted = Arel::Table.new('points', as: 'sorted')
query = sorted.from(points.order('timestamp desc').project('*')).project(sorted[Arel.star]).group(sorted[:client_id])
query.to_sql

Of course, in your case, points and sorted would be retrieved and tailored from the Points model as opposed to manufactured as above.

Raels
  • 400
  • 2
  • 8