1

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:

class Car < ActiveRecord::Base
  has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
  has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"

  delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true

  # ...
end

To give you an idea of what the statuses hold:

loc = Car.first.latest_location   # Location object (id = 1 for example)
loc.name                          # "Miami, FL"

Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:

# car.rb
def self.by_location_id(id)
  ids = []
  find_each(include: :latest_status) do |car|
    ids << car.id if car.latest_status.try(:location_id) == id.to_i
  end
  where("id in (?)", ids)
end

There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.

To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:

def by_location_name(loc)
  ids = []
  find_each(include: {latest_status: :location}) do |car|
    ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
  end
  where("id in (?)", ids)
end

This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.

Using Postgres 9.1 (hosted on Heroku cedar stack)

Erik J
  • 828
  • 9
  • 22
  • I have a feeling that an efficient solution to your problem will be somewhat database specific. Could you please add the name and version of your database to the question? – MrTheWalrus Aug 01 '12 at 17:40
  • @MrTheWalrus good call, I am using the Heroku cedar stack with Postgres 9.1 – Erik J Aug 02 '12 at 01:22
  • 1
    It may be easier to keep latest statuses and old statuses in separate tables. That way you still have your history, but you don't have difficulties querying. Somewhat related: http://stackoverflow.com/questions/762405/database-data-versioning – Mischa Aug 02 '12 at 02:41

1 Answers1

2

All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):

This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/

I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):

select * 
from (
  select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn 
  from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?

This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).

Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:

def self.by_location(loc)
  joins(
    self.escape_sql('join (
    select * 
    from (
      select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn 
      from cars join statuses on cars.id = statuses.car_id
    ) q
    where rn = 1 and location_id = ?
    ) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
  )
end

Join will act as a filter, giving you only the Car objects that were involved in the subquery.

Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):

class ActiveRecord::Base
  def self.escape_sql(clause, *rest)
    self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
  end
end

This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.

For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:

def self.by_location_name(name)
 loc = Location.find_by_name(name)
 by_location(loc)
end
MrTheWalrus
  • 9,670
  • 2
  • 42
  • 66
  • Thanks for your answer.. I tried briefly to get this to work without much success, but haven't had much time lately. I'll get back with more info. Thanks again – Erik J Aug 08 '12 at 01:28
  • FYI I couldn't get this to work really, it may just be that I suck at sql. Anyway, what I eventually ended up doing was instead of depending on timestamp to find the latest status every time, I put a `latest?` flag on CarStatus and an after_save callback that makes sure that flag is set on the most recent status for that car. So my Car scopes are much better now, just checking for that flag. – Erik J Aug 17 '12 at 15:03