3

In my rails app (using postgresql), I'm trying to compose an Active Record query to find a group of volunteer records and then order them by first_name, then last_name, then email. Additionally, first_name and last_name may be null (either both will be null or both will be not null). For example, I would want to following list to be sorted thusly:

  1. Volunteer [first_name: 'Alex', last_name: 'Diego', email: 'a.diego@person.com']
  2. Volunteer [first_name: null, last_name: null, email: 'cxxr@person.com']
  3. Volunteer [first_name: 'Josh', last_name: 'Broger', email: 'broger@person.com']
  4. Volunteer [first_name: 'Josh', last_name: 'Broger', email: 'jcool@person.com']
  5. Volunteer [first_name: 'Josh', last_name: 'Kenton', email: 'aj@person.com']

Originally, I had the following code:

Volunteer.joins(:volunteer_lists).
  where("(volunteer_lists.organizer_id = ? AND organizer_type = 'Organization') OR
  (volunteer_lists.organizer_id IN (?) AND organizer_type = 'Collaborative')",
  self.organization.id, collaboratives).uniq.
  order(:first_name, :last_name, :email)

This code works except the results are grouped by volunteers with first_name & last_name first, other volunteers with only email last (so in the example list above, volunteer #2 would be last). The answer to this helpful post indicates that I should use a COALESCE() function in the ORDER BY part of the statement to get the results I want. Awesome! So I updated my code to the following:

Volunteer.joins(:volunteer_lists).
  where("(volunteer_lists.organizer_id = ? AND organizer_type = 'Organization') OR
  (volunteer_lists.organizer_id IN (?) AND organizer_type = 'Collaborative')",
  self.organization.id, collaboratives).uniq.
  .order('COALESCE("volunteers"."first_name", "volunteers"."email") ASC, COALESCE("volunteers"."last_name", "volunteers"."email") ASC, "volunteers"."email" ASC')

The problem is that this code now returns

PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

Using to_sql on both versions of the code, I find they are exactly the same except for the addition of the COALESCE() function.

to_sql of original, working, code:

SELECT  "organizations".* FROM "organizations" WHERE "organizations"."id" = $1 LIMIT 1  [["id", 1]]
  => "SELECT DISTINCT \"volunteers\".* FROM \"volunteers\" INNER JOIN \"volunteer_list_connectors\" ON \"volunteer_list_connectors\".\"volunteer_id\" = \"volunteers\".\"id\" INNER JOIN \"volunteer_lists\" ON \"volunteer_lists\".\"id\" = \"volunteer_list_connectors\".\"volunteer_list_id\" WHERE ((volunteer_lists.organizer_id = 1 AND organizer_type = 'Organization') OR\n      (volunteer_lists.organizer_id IN (1) AND organizer_type = 'Collaborative'))  ORDER BY \"volunteers\".\"first_name\" ASC, \"volunteers\".\"last_name\" ASC, \"volunteers\".\"email\" ASC"

to_sql of updated code (the only difference is after ORDER BY):

SELECT  "organizations".* FROM "organizations" WHERE "organizations"."id" = $1 LIMIT 1  [["id", 1]]
  => "SELECT DISTINCT \"volunteers\".* FROM \"volunteers\" INNER JOIN \"volunteer_list_connectors\" ON \"volunteer_list_connectors\".\"volunteer_id\" = \"volunteers\".\"id\" INNER JOIN \"volunteer_lists\" ON \"volunteer_lists\".\"id\" = \"volunteer_list_connectors\".\"volunteer_list_id\" WHERE ((volunteer_lists.organizer_id = 1 AND organizer_type = 'Organization') OR\n      (volunteer_lists.organizer_id IN (1) AND organizer_type = 'Collaborative'))  ORDER BY COALESCE(\"volunteers\".\"first_name\", \"volunteers\".\"email\") ASC, COALESCE(\"volunteers\".\"last_name\", \"volunteers\".\"email\") ASC, \"volunteers\".\"email\" ASC"

I tested trying my new code without .uniq (to remove the DISTINCT part of the sql) and when I do this the new code runs without error, however the results are NOT sorted properly: they are sorted the same way as my original code is (the code without COALESCE()).

I imagine that there is a syntax error that I've committed, but I can't figure out what it is (or perhaps I'm wrong and COALESCE() is not the proper solution to my problem).

Any help is GREATLY appreciated!!

UPDATE & ANSWER

After getting invaluable help from Kristján and his answer below, I solved what turned out to be multiple problems:

  1. When you add .uniq to an ActiveRecord query, it adds DISTINCT to the sql that gets sent to the database. SELECT DISTINCT has some stricter requirements than simply SELECT. As pointed out by Kristján and described in more detail in this S.O. answer, the DISTINCT expression(s) must match the leftmost ORDER BY expression(s). When I updated .order() with my sql fragment including COALESCE(), I also needed to add a matching sql fragment to the SELECT part of the statement with .select().
  2. 1 above just removes the error I was getting. At that point, my query was running but the results were being sorted the same as they were before using COALESCE(). Kristján provides a proper description in his answer below, but turns out my query was running correctly, its just that COALESCE() sorts anything uppercase before anything lowercase. so "Z" will be sorted in front of "a". This problem can be solved by adding a function to convert the COALESCE() fields to lowercase using LOWER().

Here is my answer:

    Volunteer.select('LOWER(COALESCE("volunteers"."first_name", "volunteers"."email")), LOWER(COALESCE("volunteers"."last_name", "volunteers"."email")), LOWER("volunteers"."email"), "volunteers".*').
      joins(:volunteer_lists).
      where("(volunteer_lists.organizer_id = ? AND organizer_type = 'Organization') OR
      (volunteer_lists.organizer_id IN (?) AND organizer_type = 'Collaborative')",
      self.organization.id, collaboratives).uniq.
      order('LOWER(COALESCE("volunteers"."first_name", "volunteers"."email")) ASC, LOWER(COALESCE("volunteers"."last_name", "volunteers"."email")) ASC, LOWER("volunteers"."email") ASC')

Note:

My answer above actually created another problem when I later call .count on the query. .count breaks because of the custom .select() fragment I've added. To solve this, I needed to add a custom volunteers_count method to the User model that didn't make use of the .select() fragment.

Community
  • 1
  • 1
John
  • 9,249
  • 5
  • 44
  • 76
  • `DISTINCT` is my kryptonite,`COALESCE` does sound right here, but looking at your code I'm not sure how to fix it. If you don't have a ton of users, I think this would be a ton easier in Ruby; but it can probably be done with some mystical SQL – Tom Prats Apr 13 '16 at 04:59
  • Unfortunately I have thousands of volunteers and that number is only increasing. I was originally doing it with ruby, but I'm trying to increase performance. – John Apr 13 '16 at 05:05

2 Answers2

3

You're running in to a letter case problem: Your names are all capitalized, but the emails are lowercase, and with most collations, uppercase letters come before lowercase. Check out this trivial example:

#= select * from (values ('b'), ('B'), ('a'), ('A')) t (letter);
 letter
--------
 b
 B
 a
 A
#= select * from (values ('b'), ('B'), ('a'), ('A')) t (letter) order by letter;
 letter
--------
 A
 B
 a
 b

So your query is actually working perfectly, it's just that cxxr@person.com sorts after Josh. To avoid this, you can sort by the lowercase value. Here's a simple version of the data you have:

#= select * from volunteers;
 first_name | last_name |       email
------------+-----------+--------------------
 Josh       | Broger    | jcool@person.com
 Josh       | Kenton    | aj@person.com
 ∅          | ∅         | cxxr@person.com
 Josh       | Broger    | broger@person.com
 Alex       | Diego     | a.diego@person.com

Then to sort using the coalesce you're after:

#= select * from volunteers order by lower(coalesce(first_name, email));
 first_name | last_name |       email
------------+-----------+--------------------
 Alex       | Diego     | a.diego@person.com
 ∅          | ∅         | cxxr@person.com
 Josh       | Broger    | broger@person.com
 Josh       | Broger    | jcool@person.com
 Josh       | Kenton    | aj@person.com

Or for your full version using ActiveRecord:

Volunteer
  .joins(:volunteer_lists)
  .where(
    "(volunteer_lists.organizer_id = ? AND organizer_type = 'Organization') OR (volunteer_lists.organizer_id IN (?) AND organizer_type = 'Collaborative')",
    organization.id, collaboratives
  )
  .order('LOWER(COALESCE("volunteers"."first_name", "volunteers"."last_name", "volunteers"."email"))')
Kristján
  • 18,165
  • 5
  • 50
  • 62
  • Oh my god you are an AMAZING human being! THANK YOU!! THANK YOU!!!! THANK YOU!!!!! It sorts properly!!!! Thank you SO much. I never would have figured that out on my own. That's exactly what was happening. Unfortunately, it STILL returns the same `PG::InvalidColumnReference` error when `.uniq` (`DISTINCT`) is included in the query. Do you know why that is? The error message indicates that `ORDER BY` expressions must appear in the select list, yet, as evidenced by my `to_sql` test, the ORDER BY expression with `LOWER(COALESCE())` appears in the same spot as the query without `LOWER(COALESCE())` – John Apr 14 '16 at 06:58
  • If you don't know what's happening with `DISTINCT`, I'll update my question to specifically address the problem you've answered and then mark your answer as correct. You seem to have shown that my query actually had two errors in it. One that I knew about (the subject of my question) and one that I didn't (your answer). – John Apr 14 '16 at 07:04
  • Regardless, I cannot thank you enough for your answer though. I don't see myself EVER figuring that out on my own. I REALLY(!!!!) appreciate your taking the time to answer my question! – John Apr 14 '16 at 07:05
  • All that `distinct` error is saying is that you have to include the value in `order by` in the `select` clause as well, so something like `select distinct lower(coalesce(first_name, last_name, email)), * from ... order by lower(...)`. But since you currently have `distinct *`, that's not going to accomplish a lot unless your tables have a ton of exactly duplicated rows in them - what is it you're trying to make distinct? – Kristján Apr 14 '16 at 07:51
  • THANK YOU! That was the information I needed. I did some more research and got everything working. I've updated my question with the complete answer to my specific problem at the end because, turns out, there were several problems. I SO appreciate you helping me out here. Couldn't have done it without you. – John Apr 14 '16 at 10:01
  • (and for interests sake: I'm building a volunteer management app for a few nonprofits I volunteer with--their current setup is horrendous. For the current user, I'm grabbing all of the orgs they have access to, grabbing all of the volunteer lists associated with those orgs, and then grabbing all of the volunteers on the volunteer lists, sorting them, and removing duplicates. I set out to learn ruby and make this app for them about a year ago--> had no idea what I was in for. Anyway, this code is actually a refactoring of the original code I baked into the system when I knew less than I do now – John Apr 14 '16 at 10:04
-3

Just throwing it out there but have you tried (mysql)

ORDER BY volunteers ASC THEN last_name ASC THEN email ASC

(wanted to make this a comment but not enough rep :((( ) if this isnt good please just comment so I can take it down so i dont lose rep :)

Max Uland
  • 87
  • 10
  • I just looked into `THEN` and I don't see it used in `ORDER BY` statements, just case statements, so I don't think this helps me. But seriously, thanks for the input! It cased me to try a few more possible solutions. Still haven't figured it out though :( – John Apr 13 '16 at 04:55
  • Try OR as well and , i had a few random queries that ended up with having weird input to get it right. Good luck!! – Max Uland Apr 13 '16 at 16:12