6

I'm trying to list the number of users by age-range:

Range  : #Users
10-14  : 16
15-21  : 120
22-29  : 312
30-40  : 12131
41-70  : 612
71-120 : 20

I was thinking of creating a static array of hashes:

AGE_RANGES = [
  {label:"10 - 14", min:10, max:14},
  {label:"15 - 21", min:15, max:21},
  {label:"22 - 29", min:22, max:29},
  {label:"30 - 40", min:30, max:40},
  {label:"41 - 70", min:41, max:70},
  {label:"71 - 120", min:71, max:120}
]

and then use it for my search filter, as well as for my query. But, I cannot think of a way of getting the most performance out of it.

My method in my model only groups by age:

def self.group_by_ageRange(minAge, maxAge)

  query = User.group("users.age")
              .where("users.age BETWEEN minAge and maxAge ")
              .select("users.age,
                        count(*) as number_of_users")

end

Any suggestions?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
MrWater
  • 1,797
  • 4
  • 20
  • 47

2 Answers2

8

You want to build some SQL that looks like this:

select count(*),
       case
           when age between 10 and 14 then '10 - 14'
           when age between 15 and 21 then '15 - 21'
           -- ...
       end as age_range
from users
where age between 10 and 120
group by age_range

In ActiveRecord terms, that would be:

# First build the big ugly CASE, we can also figure out the
# overall max and min ages along the way.
min   = nil
max   = nil
cases = AGE_RANGES.map do |r|
    min = [r[:min], min || r[:min]].min
    max = [r[:max], max || r[:max]].max
    "when age between #{r[:min]} and #{r[:max]} then '#{r[:min]} - #{r[:max]}'"
end

# Then away we go...
age_ranges = Users.select("count(*) as n, case #{cases.join(' ')} end as age_range")
                  .where(:age => min .. max)
                  .group('age_range')
                  .all

That will leave you with an array of objects in age_ranges and those objects will have n and age_range methods. If you want a Hash out of that, then:

age_ranges = Hash[age_ranges.map { |r| [r.age_range, r.n] }]

That won't include ranges that don't have any people in them of course; I'll leave that as an exercise for the reader.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • this solved my issue, thank you. I now seem to be having some rails bug inside the controller... If i add `logger.debug("items: #{@ageRange_items.inspect}")` it all goes fine...If not, it just sets the `cases.join(' ')` as `age_range`, giving a natural error saying `column age_range does not exist` – MrWater Aug 24 '12 at 00:17
  • @itsalltime: That's hard to diagnose without seeing the final code. – mu is too short Aug 24 '12 at 00:36
  • here it goes...it's a bit more complex than the question i posted, since it has chain joins, but it shouldn't have this behaviour `query = User.joins(shops: {receipts: {tag: :user}}) .select("case #{cases.join(' ')} end as age_range, count(*) as number_of_users, sum(total) as total") .where("users.id= :user_id ", user_id: user) .where("users_tags.age" => min .. max) .group("age_range")` – MrWater Aug 24 '12 at 00:40
  • and in my controller i have `@ageRange_items = current_user.get_dashboard(nil,nil,nil,"ageRange").paginate(page: params[:page])` . If i leave it like that i have the mentioned error. If I add a `logger.debug("#{@ageRange_items.inspect}")` then it all goes smoothly, and the page is loaded... – MrWater Aug 24 '12 at 00:41
  • and here the part of the error that matters `SQLite3::SQLException: no such column: age_range: SELECT COUNT(*) AS count_all, age_range AS age_range FROM "users" INNER JOIN` – MrWater Aug 24 '12 at 00:49
  • Where is `count_all` coming from? – mu is too short Aug 24 '12 at 00:57
  • it seems to come from ActiveRecord definition on activerecord-3.2.5/lib/relation/calculations.rb file: `if operation == 'count' && column_name == :all aggregate_alias = 'count_all' else aggregate_alias = column_alias_for(operation, column_name) end` – MrWater Aug 24 '12 at 01:08
  • this seems to be executed with the .any? on my view : `<% if @ageRange_items.any? %>` – MrWater Aug 24 '12 at 01:12
  • it really seems a rails bug. If i use .nil? instead of .any? it all goes fine also. with more digging, it seems to go down to the empty? method. – MrWater Aug 24 '12 at 01:23
  • 1
    Sounds like `@ageRange_items` might still be a relation rather than an array when it gets into your view. I don't see a `.all` or `.to_a` anywhere so you might not be getting the `any?` that you think you are. `if @ageRange_items.present?` would probably be more idiomatic anyway. – mu is too short Aug 24 '12 at 01:35
  • adding the .all to the query, or the to_a to the controller does the trick...(the .present? only works also if i make it an array) but this still doesn't explain why adding the logger.debug(@ageRange.inspect) would solve the problem too... – MrWater Aug 24 '12 at 01:44
  • btw, do you know whats the reasoning on adding a .all ? I find it reasonable having User.all, but User.where(...).all doesn't seem very reasonable... – MrWater Aug 24 '12 at 01:53
  • 1
    ActiveRecord won't touch the database until you ask for some results by calling `.all`, `.to_a`, `.each`, ... When you say `x = M.where(...)`, you get a relation object back so that you can chain methods to build your query: `M.where(...).where(...).order(...).where(...)`. When you `x.all`, you talk to the database and get some results. I'd guess that `x.inspect` gets the results and caches them. You'll notice that [`any?`](http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-any-3F) is implemented directly by ActiveRecord::Relation so you need `all` or you get AR's `any?` – mu is too short Aug 24 '12 at 02:01
2

I find the accepted answer to be a bit dense. Fast but hard to understand and write. Today, I came up with a slower but simpler solution. Since we are grouping ages into ranges, we can assume that we won't have values over 125

That means that if you use a ruby filter on a grouped and counted result set, you won't iterate over more than 125 items. This will be slower than a sql range based group/count, but it was fast enough for my purposes while still relying on the DB for most of the heavy lifting. Iterating over a hash with less than 125 items doesn't seem like a big deal. Especially when the key value pairs are just ints like this:

{
  0 => 0,
  1 => 1,
  3 => 5,
  25 => 3,
  99 => 3
}

Here's the psudo-code:

users = User
  .where(age: (min..max))
  .group(:age)
  .count(:age)
group = Hash.new(0)
users.each{|age, count|
      case
      when age <= 10
        group['under 10'] += count
      when age <= 25
        group['11-25'] += count
      when age <= 40
        group['26-40'] += count
      else
        group['41+'] += count
      end
}

Note: this solution provides the count of users in a given range.

Jared Menard
  • 2,654
  • 1
  • 21
  • 22