1

My application has the concept of registration questions and answers

class RegistrationQuestion < ApplicationRecord
  has_many :registration_answers, dependent: :destroy

  # RegistrationQuestions that are 'original' questions are the questions
  # configured to join as a full member, they have a null original_id field.

  # However we also support 'one time events'. When created those events get a copy of the full
  # registration questions so that users can have different questions for one time events compared to full membership

  belongs_to :original, class_name: "RegistrationQuestion", optional: true
  has_many :derived_questions, class_name: "RegistrationQuestion", foreign_key: 'original_id'
end

In order to give returning users suggested answers, I'd like to retrieve the most recent answer for any related question. Here's where I'm getting tripped up:

class RegistrationAnswer < ApplicationRecord
  belongs_to :registration_question

  # Simplified. I can get the related question ids, need help getting distinct answers
  def self.related_answers_for(profile, question_ids)
    recent_related_answer_ids = RegistrationAnswer.joins(:registration_question)
                                                  .where(profile: profile, registration_question_id: related_question_ids)
                                                  .order(updated_at: :desc)
                                                  .select('registration_questions.original_id', :updated_at, :id)
                                                  .group('registration_questions.original_id')
                                                  .map { |ra| ra.id }

    RegistrationAnswer.where(id: related_answer_ids)
  end
end

I'm trying to get the most recent answer for any related registration question. Related registration questions are ones with the same original_id set.

However this sql throws a "PG::GroupingError: ERROR" telling me I need to put both updated_at and id in the group clause. However, when I add those two it is no longer grouping them as I want, since it's returning the group distinct based on the registration answer id. What I really want is the most recent answer, grouped by the registration_question.original id only.

jarlh
  • 42,561
  • 8
  • 45
  • 63
Msencenb
  • 5,675
  • 11
  • 52
  • 84
  • Does changing the order of the group arguements change the grouping of the results? – Int'l Man Of Coding Mystery Apr 08 '21 at 19:42
  • No, order seems to stay the same regardless of group argument order – Msencenb Apr 08 '21 at 20:21
  • You could try a window function. Im not an expert with them but you might be able to use one to do what you need – Int'l Man Of Coding Mystery Apr 08 '21 at 22:27
  • I don't know activerecord, but if I were writing a query to do this, I would use the postgres specific syntax `DISTINCT ON` and not group by. – Jeremy Apr 09 '21 at 01:13
  • This should be easy to optimize given actual Postgres table definitions (`CREATE TABLE` statements) and cardinalities. But I don't see through the obfuscation layer. Basic instructions: https://stackoverflow.com/a/7630564/939860 – Erwin Brandstetter Apr 12 '21 at 18:59
  • Although this does not solve you problem in Rails, it might give you a better understanding about the problem you are facing. [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) With complex queries it is good practice to know what query you want to write before using Rails/Arel. – 3limin4t0r Apr 19 '21 at 13:18

1 Answers1

1

Since you are using a Postgres DB you can use a ROW_NUM() https://www.postgresqltutorial.com/postgresql-row_number/ where you number the rows based on the original_id and the updated_at order then you will query only the first record (most_recent) which is the row with row_number = 1

you can use a find_by_sql which will return the attributes queried and allows you to write SQL directly in RoR

https://apidock.com/rails/v6.1.3.1/ActiveRecord/Querying/find_by_sql

RegistrationAnswer.find_by_sql(
    <<~SQL.squish
      with tbl_numbered as (
          SELECT 
            registration_questions.original_id, registration_answers.id, registration_answers.updated_at
            ROW_NUMBER() OVER (PARTITION BY registration_questions.original_id ORDER BY registration_answers.updated_at DESC) as rk
          FROM 
            registration_answers
          JOIN registration_questions ON registration_question.id = registration_answers.registration_question_id
          WHERE registration_answers.profile = ? AND registration_question_id = ?
      )

      select id, updated_at from tmp_table where rk = 1
    SQL,
    [profile, related_question_ids]
)

this will returns an instance of RegistrationAnswer with id and updated_at only you can check what they return then map the ids and re-query the table for the records

OR

you can use the select in ruby to filter by the ROW_NUM which might have performance issues depending how big your table is

recent_related_answers = RegistrationAnswer.joins(:registration_question).
                            where(profile: profile, registration_question_id: related_question_ids).
                            select("registration_questions.original_id", "updated_at", "id", "ROW_NUMBER() OVER (PARTITION BY registration_questions.original_id ORDER BY registration_answers.updated_at DESC) as RK")

                                                  
recent_related_answer_ids = recent_related_answers.select { |record| record.RK == 1 }.map(&:id)

the queries might needs some updated as I'm not sure how your schema looks like - let me know if it doesn't work

Mshka
  • 1,798
  • 1
  • 10
  • 19
  • Thanks for writing this up. Although I have not tested this in my own app, this feels like what I was looking for. I've awarded the bounty, I appreciate the answer! – Msencenb Apr 19 '21 at 15:20
  • Thanks mate! let me know if it doesn't work – Mshka Apr 19 '21 at 15:32