9

I want to restructure the query below using Squeel. I'd like to do this so that I can chain the operators in it and re-use the logic in the different parts of the query.

User.find_by_sql("SELECT 
    users.*,
    users.computed_metric,
    users.age_in_seconds,
    ( users.computed_metric / age_in_seconds) as compound_computed_metric
    from
    (
      select
        users.*,
        (users.id *2 ) as computed_metric,
        (extract(epoch from now()) - extract(epoch from users.created_at) ) as age_in_seconds
        from users
    ) as users")

The query has to all operate in the DB and should not be a hybrid Ruby solution since it has to order and slice millions of records.

I've set the problem up so that it should run against a normal user table and so that you can play with the alternatives to it.

Restrictions on an acceptable answer

  • the query should return a User object with all the normal attributes
  • each user object should also include extra_metric_we_care_about, age_in_seconds and compound_computed_metric
  • the query should not duplicate any logic by just printing out a string in multiple places - I want to avoid doing the same thing twice
  • [updated] The query should all be do-able in the DB so that a result set that may consist of millions of records can be ordered and sliced in the DB before returning to Rails
  • [updated] The solution should work for a Postgres DB

Example of the type of solution I'd like

The solution below doesn't work but it shows the type of elegance that I'm hoping to achieve

class User < ActiveRecord::Base
# this doesn't work - it just illustrates what I want to achieve

  def self.w_all_additional_metrics
    select{ ['*', 
              computed_metric, 
              age_in_seconds, 
              (computed_metric / age_in_seconds).as(compound_computed_metric)] 
      }.from{ User.w.compound_computed_metric.w_age_in_seconds }
  end

  def self.w_computed_metric
    where{ '(id *2 ) as computed_metric' }
  end

  def self.w_age_in_seconds
    where{ '(extract(epoch from now()) - extract(epoch from created_at) ) as age_in_seconds' }
  end
end

You should be able to run this against your existing database

Please note that I've somewhat contrived the problem so that you can use your existing User class and play with it in your console.

EDIT

  1. The DB I'm using is Postgres.
  2. I'm not sure I made it 100% clear that the query should all execute in the DB. It can't be a hybrid answer were some of the logic is essentially done in Rails. This is important since I want to be able to order and slice millions of records using the computed columns.
Peter Nixey
  • 16,187
  • 14
  • 79
  • 133
  • 2
    Just lobbing softballs my way... how hard could solving this be... ugh! The race is on to get the bounty! – erroric Aug 09 '13 at 14:35
  • I'm not sure that this is actually possible without violating your third restriction on an acceptable answer. In case you haven't already had a look, I've taken steps towards something similar [here](http://stackoverflow.com/questions/15412142/ranking-results-with-complex-conditions-using-rails-and-squeel) and [here](http://stackoverflow.com/questions/15274910/multiple-joins-to-the-same-model-using-multiple-belongs-to-associations). – erroric Aug 09 '13 at 20:36
  • I don't get it. What you described can be simply solved without any fancy squeel chaining. There is no need to use a sub-query. The compound_computed_metric is calculated against each row in the table, and there is no aggregation. – wanghq Aug 10 '13 at 17:51
  • @wanghq - that would be great if it is the case. Could you write the query that achieves that? erroric - I'll take a look, ty – Peter Nixey Aug 10 '13 at 22:54
  • @erroric - it's avoiding the duplicated SQL (and DB work) that's hard. I don't think it fits any of the other two examples that you posted although you're clearly thinking along the same lines as me – Peter Nixey Aug 11 '13 at 12:49

3 Answers3

7

I have 2 sulutions in your case. My database is mysql, and I simplify your code for demo, I think you can extend it.

The first is Squeel way, I mixed "sift" in Squeel and "from" in ActiveRecord Query. I installed postgresql and tested my solution just now, It seems hardly to use "squeel" and "epoch from" together, but I found an alternative way in postgresql, it called "date_part". I also modified the sql and reduced the duplications of calculation:

class User < ActiveRecord::Base           
  sifter :w_computed_metric do
    (id * 2).as(computed_metric)
  end

  sifter :w_age_in_seconds do
    (date_part('epoch' , now.func) - date_part('epoch', created_at)).as(age_in_seconds)
  end

  sifter :w_compound_computed_metric do
    (computed_metric / age_in_seconds).as(compound_computed_metric)
  end

  def self.subquery
    select{['*', sift(w_computed_metric) , sift(w_age_in_seconds)]}
  end

  def self.w_all_additional_metrics
    select{['*', sift(w_compound_computed_metric)]}.from("(#{subquery.to_sql}) users")
  end      
end

It produced the sql:

SELECT *, "users"."computed_metric" / "users"."age_in_seconds" AS compound_computed_metric 
FROM (SELECT *, 
             "users"."id" * 2 AS computed_metric, 
             date_part('epoch', now()) - date_part('epoch', "users"."created_at") AS age_in_seconds FROM "users" 
     ) users

You can test it using the console:

> User.w_all_additional_metrics.first.computed_metric
=> "2"
> User.w_all_additional_metrics.first.age_in_seconds
=> "633.136693954468"
> User.w_all_additional_metrics.first.compound_computed_metric
=> "0.00315887551471441"

The second is ActiveRecord way, because your sql is not very complicate, so you can chain it in ActiveRecord Query, it's enough with some scopes :

class User < ActiveRecord::Base
  scope :w_computed_metric, proc { select('id*2 as computed_metric') }
  scope :w_age_in_seconds, proc { select('extract (epoch from (now()-created_at)) as age_in_seconds') }
  scope :w_compound_computed_metric, proc { select('computed_metric/age_in_seconds as compound_computed_metric') }

  def self.subquery
    select('*').w_computed_metric.w_age_in_seconds
  end

  def self.w_all_additional_metrics
    subquery.w_compound_computed_metric.from("(#{subquery.to_sql}) users")
  end
end

This produces the following SQL:

SELECT 
  *, id*2 as computed_metric, 
  extract (epoch from (now()-created_at)) as age_in_seconds, 
  computed_metric / age_in_seconds as compound_computed_metric
FROM (
    SELECT 
      *, 
      id*2 as computed_metric, 
      extract (epoch from (now()-created_at)) as age_in_seconds 
    FROM 
      "users" 
    ) users 
ORDER BY compound_computed_metric DESC 
LIMIT 1

Hope it meets your requirement :)

Bigxiang
  • 6,252
  • 3
  • 22
  • 20
  • Thank you for this, your answer's very interesting. Unfortunately though I was looking to solve this in Sqeel rather than Sequel though it was interesting to discover that Sequel existed as I hadn't heard of it before! – Peter Nixey Aug 17 '13 at 15:18
  • I am very sorry, I read "squeel" to "sequel".... , I have read the document of squeel and will update the answer. – Bigxiang Aug 18 '13 at 08:08
  • I'm really impressed with this. I haven't managed to make the Squeel version work just yet but am still playing with it to see if I can. I really like your pure AR version however I've got a question. I've put the SQL that it generates into your answer (feel free to delete it but comments get crunched). If you notice in the SQL the computations are actually done twice - once in the subquery and then once in the super-query. Is it possible to reduce them to only happening once to save on computation? If you can do it cleanly without the computational duplication the points are yours. – Peter Nixey Aug 19 '13 at 10:31
  • I think that the Squeel query has the same problem ast he other one - in postgres it needs the extract(epoch) function to work. Would be very interested if you knew how to add that into the Squeel – Peter Nixey Aug 19 '13 at 10:39
  • Hi, I updated the squeel solution again, please check it again :) – Bigxiang Aug 19 '13 at 13:52
  • Wow. I'm really impressed. I didn't think I was going to get an answer that works but as far as I can tell this does exactly what I was hoping for. Great work. Also can you ping me via email - petenixey at gmail. I'd value staying in touch. – Peter Nixey Aug 19 '13 at 16:21
  • Hi, I sent an email to you. I notice your name is peternixey, but email is petenixey. Is it correct? – Bigxiang Aug 20 '13 at 02:54
  • 2
    I can't believe how slick this is. Bigxiang you are my hero! One of the keys to this solution is using the alias of 'users' for the subquery in the `from` clause. Again, FANTASTIC! – erroric Aug 21 '13 at 14:50
  • 1
    And one more thing - you resolved the issue of `extract(epoch from ...)` that caused all sorts of grief for me. :) – erroric Aug 21 '13 at 14:53
2

It's very probably that I am totally wrong. I feel you simplify your problem too much in order to let others understand it. Since I can't give this well-formatted code in a comment, I input the answer here.

SELECT 
    users.*,
    users.computed_metric,
    users.age_in_seconds,
    ( users.computed_metric / age_in_seconds) as compound_computed_metric
    from
    (
      select
        users.*,
        (users.id *2 ) as computed_metric,
        (extract(epoch from now()) - extract(epoch from users.created_at) ) as age_in_seconds
        from users
    ) as users

Below SQL is equivalent to your above SQL. That's why I say the sub-query is not necessary.

select
  users.*,
  (users.id *2 ) as computed_metric,
  (extract(epoch from now()) - extract(epoch from users.created_at) ) as age_in_seconds,
  computed_metric/age_in_seconds as compound_computed_metric
  from users

If that's right, then the compound_computed_metric can be calculated in below way. No custom query is needed.

class User < ActiveRecord::Base

  def compound_computed_metric
    computed_metric/age_in_seconds
  end
  def computed_metric
    self.id * 2
  end
  def age_in_seconds
    Time.now - self.created_at
  end
end

1.9.3p327 :001 > u = User.first
  User Load (0.1ms)  SELECT "users".* FROM "users" LIMIT 1
 => #<User id: 1, name: "spider", created_at: "2013-08-10 04:29:35", updated_at: "2013-08-10 04:29:35">
1.9.3p327 :002 > u.compound_computed_metric
 => 1.5815278998954843e-05
1.9.3p327 :003 > u.age_in_seconds
 => 126471.981447
1.9.3p327 :004 > u.computed_metric
 => 2
Peter Nixey
  • 16,187
  • 14
  • 79
  • 133
wanghq
  • 1,336
  • 9
  • 17
  • Hi Wang, my apologies. In the original SQL I forgot to request the compound metric and age in seconds in addition to the compound computed metric. You're right, I oversimplified it. I've just updated the question to include the request for both of them in the original SQL. Thank you – Peter Nixey Aug 11 '13 at 09:06
  • I updated my answer accordingly. Still feel the subquery is not necessary. – wanghq Aug 11 '13 at 15:44
  • Did you manage to make that work because I got the following errors: `ERROR: column "computed_metric" does not exist LINE 5: computed_metric/age_in_seconds as compound_computed_metric` - I don't think you can use a computed column a second time in a query (I fixed some syntactic errors in your answer btw). Also thank you for your time on this – Peter Nixey Aug 12 '13 at 10:49
  • This question shows the issue on re-using columns: http://stackoverflow.com/questions/8840228/postgresql-using-a-calculated-column-in-the-same-query. hence the sub-query! – Peter Nixey Aug 12 '13 at 10:50
  • which SQL database do you use? I used the default sqlite for experiment and have no problem doing the operations as I showed above (see console output). – wanghq Aug 13 '13 at 00:07
  • I was using Postgres - there seemed to be a few extraneous commas which were making things fail. BTW I'm sorry that I screwed up with the bounty. I didn't really know how it was going to work so I'm going to create a new bounty and split it between you and @erroric (if I can figure out how) – Peter Nixey Aug 14 '13 at 10:08
1

Let's preface this with it's not the answer you are looking for...

Now, with that out of the way, here is what I tried and how it relates to the two links I posted in the comments of the question.

class User < ActiveRecord::Base
  # self-referential association - more on this later
  belongs_to :myself, class_name: "User", foreign_key: :id

  scope :w_computed_metric, ->() { select{[id, (id *2).as(computed_metric)]} }
  scope :w_age_in_seconds,  ->() { select{[id, (extract('epoch from now()') - extract('epoch from users.created_at')).as(age_in_seconds)]} }
  scope :w_default_attributes, ->() { select{`*`} }

  def self.compound_metric
    scope = User.w_default_attributes.select{(b.age_in_seconds / a.computed_metric).as(compound_metric)}
    scope = scope.joins{"inner join (" + User.w_computed_metric.to_sql + ") as a on a.id = users.id"}
    scope = scope.joins{"inner join (" + User.w_age_in_seconds.to_sql + ") as b on b.id = users.id"}
  end

  sifter :sift_computed_metric do
    (id * 2).as(computed_metric)
  end

  sifter :sift_age_in_seconds do
    (extract(`epoch from now()`) - extract(`epoch from users.created_at`)).as(age_in_seconds)
  end

  def self.using_sifters_in_select
    User.w_default_attributes.joins{myself}.select{[(myself.sift :sift_computed_metric), (myself.sift :sift_age_in_seconds)]}
  end

  def self.using_from
    scope = User.w_default_attributes
    scope = scope.select{[(age_in_seconds / computed_metric).as(compound_metric)]}
    scope = scope.from{User.w_computed_metric.w_age_in_seconds}
  end
end

So, running User.compound_metric in console will yield the results you are looking for - a User object with the additional attributes: computed_metric, age_in_seconds, and compound_metric. Unfortunately, this violates the third constraint you placed on an acceptable answer. Oh well...

I also tried a few other things (as you can see from above):

First point of note is the self-referential association, which I'm quite proud of - even though it doesn't get us to where we want to go.

belongs_to :myself, class_name: "User", foreign_key: :id

This nifty piece of code lets you access the same object through a join. Why is this important? Well, Squeel will only allow you to access associations through the joins{} method unless you pass it a string of SQL. This lets us use the sifters feature of Squeel - in this case not to filter the results, rather to include aggregate columns from the db and let Squeel do the heavy lifting of aliasing and joining the statements. You can test it with the

def self.using_sifters_in_select
  User.w_default_attributes.joins{myself}.select{[(myself.sift :sift_computed_metric), (myself.sift :sift_age_in_seconds)]}
end

The beauty of sifters to acheive this is the chainability and syntatic sugar - it is very flat and readable.

The last bit I tried playing with is .from{}. Before this question, I didn't even know it existed. I was soooo excited with the possibility that I had missed something so simple as including a source for a query (in this case a sub-select). Testing with using_from

def self.using_from
    scope = User.w_default_attributes
    scope = scope.select{[(age_in_seconds / computed_metric).as(compound_metric)]}
    scope = scope.from{User.w_computed_metric.w_age_in_seconds}
end

results in a TypeError:

TypeError: Cannot visit Arel::SelectManager
  from /home/prg10itd/projects/arel/lib/arel/visitors/visitor.rb:28:in `rescue in visit'
  from /home/prg10itd/projects/arel/lib/arel/visitors/visitor.rb:19:in `visit'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:348:in `visit_Arel_Nodes_JoinSource'
  from /home/prg10itd/projects/arel/lib/arel/visitors/visitor.rb:21:in `visit'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:139:in `visit_Arel_Nodes_SelectCore'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:121:in `block in visit_Arel_Nodes_SelectStatement'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:121:in `map'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:121:in `visit_Arel_Nodes_SelectStatement'
  from /home/prg10itd/projects/arel/lib/arel/visitors/visitor.rb:21:in `visit'
  from /home/prg10itd/projects/arel/lib/arel/visitors/visitor.rb:5:in `accept'
  from /home/prg10itd/projects/arel/lib/arel/visitors/to_sql.rb:19:in `accept'

(and yes, I'm testing against a local copy of the Arel and Squeel). I'm not familiar enough with the internal workings of Arel to resolve the issue without further effort (and most likely a fork of Arel). It does appear that Squeel just passes the from{} method to the Arel from() method without doing anything (beyond the rest of the magic that is Squeel).

So where does that leave us? A solution that works, but is not as nice and elegant as I wish it was - but maybe someone else can leverage this to a better solution.

PS - this is with Rails v3.2.13 and the respective version of Arel. The source for Rails v4 and Arel are quite different and not tested for this.

erroric
  • 991
  • 1
  • 11
  • 22
  • Hi Erroric and apologies for taking so long to reply to your excellent answer. I believe I have hit the same error as you on the `from` method before and I *think* it's something to do with a gem incompatibility. Your answer was really interesting in that it took me towards trying to discover what sifters are (which I've never previously known) and also gave me a good way to look at things differently. I've upvoted it and am also happy to upvote any comments. Thank you – Peter Nixey Aug 17 '13 at 15:14