0

I am building a "quiz app" in Python, and I need to store results in a SQL database. I want to use SQLAlchemy Python library to interact with the database. Each user of my app will be asked 3 randomly selected questions from a predetermined set of 100 possible questions. Each question can only be answered "Yes" or "No" (i.e. True or False). I store answers in a table defined as follows:

class Answer(Base):
    __tablename__ = "Answers"
    
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("Users.id"), nullable=False)
    question_id = Column(Integer)
    answer = Column(Boolean, nullable=False)
    
    user = relationship("User", back_populates="answers")

After all users complete the quiz, I calculate how many times a certain question was answered by users:

tot_each_question = (db_session
                     .query(Answer.question_id,
                            count_questions.label("tot_answers_for_question"))
                     .group_by(Answer.question_id)
                     )

I can also calculate how many times a certain question was answered "Yes" (i.e. True) by users:

tot_true_for_question = (db_session
                         .query(Answer.question_id,
                                count_questions.label("tot_true_for_question"))
                         .filter(Answer.answer == True)
                         .group_by(Answer.question_id)
                         )

How do I calculate the percentage each question was answered "Yes" by users, with SQLAlchemy? I can easily do that with basic Python dictionaries:

dict_tot_each_question = {row.question_id: row.tot_answers_for_question
                          for row in tot_each_question.all()}

dict_tot_true_for_question = {row.question_id: row.tot_true_for_question
                              for row in tot_true_for_question.all()}

dict_percent_true_for_question = {}
for question_id, tot_answers in dict_tot_each_question.items():
    tot_true = dict_tot_true_for_question.get(question_id, 0)
    percent_true = tot_true / tot_answers * 100
    dict_percent_true_for_question[question_id] = percent_true

But I prefer to use SQLAlchemy functionality to obtain the same result. Is it possible to do that in SQLAlchemy? Would it be convenient and efficient to do that in SQLAlchemy, or would my solution based on Python dictionary be better for any reason?

Enrico Gandini
  • 855
  • 5
  • 29

1 Answers1

2

Just combining two expressions from the two queries you already have into one will give you the desired result:

q = (
    session.query(
        Question.id,
        (100 * func.sum(cast(Answer.answer, Integer)) / func.count(Answer.answer)).label("perc_true"),
    )
    .outerjoin(Answer)
    .group_by(Question.id)
)

As you can see above, i used COUNT function for all the answers.

Another item to note is that my query starts with the Question and JOINs the Answer table. The reason for this is that in case there is the Question with no answers, you will still see the (#id, NULL) returned instead of not seeing a row at all if you use only Answers table. But if you do not care about this corner case being handled as I see it, you could do it your way:

q = (
    session.query(
        Answer.question_id,
        (100 * func.sum(Answer.answer) / func.count(Answer.answer)).label("perc_true"),
    )
    .group_by(Answer.question_id)
)

Finally, one more assumption i made is that your database will handle the true as 1 for the sake of proper SUM after casting to Integer. Shall this not be the case, please refer to multiple answers in this question on how to handle this: postgresql - sql - count of `true` values


BONUS:

When i find myself asking for some aggregation related questions on the model level, i often implement these on the model directly using the Hybrid Attributes extension.

Code below will give you and indication on how you could use it for your case:

class Answer(Base):
    __tablename__ = "answers"

    id = Column(Integer, primary_key=True)
    # user_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    question_id = Column(Integer, ForeignKey("questions.id"))
    answer = Column(Boolean, nullable=False)

    # user = relationship("User", back_populates="answers")
    question = relationship("Question", back_populates="answers")


class Question(Base):
    __tablename__ = "questions"

    id = Column(Integer, primary_key=True)
    question = Column(String, nullable=False)

    answers = relationship("Answer", back_populates="question")

    @hybrid_property
    def answers_cnt(self):
        return len(list(self.answers))

    @hybrid_property
    def answers_yes(self):
        return len(list(_ for _ in self.answers if _.answer))

    @hybrid_property
    def answers_yes_percentage(self):
        return (
            100.0 * self.answers_yes / self.answers_cnt if self.answers_cnt != 0 else None
        )

    @answers_cnt.expression
    def answers_cnt(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .label("answers_cnt")
        )

    @answers_yes.expression
    def answers_yes(cls):
        return (
            select(func.count(Answer.id))
            .where(Answer.question_id == cls.id)
            .where(Answer.answer == True)
            .label("answers_yes")
        )

    @answers_yes_percentage.expression
    def answers_yes_percentage(cls):
        return (
            case(
                [(cls.answers_cnt == 0, None)],
                else_=(
                    100
                    * cast(cls.answers_yes, Numeric)
                    / cast(cls.answers_cnt, Numeric)
                ),
            )
        ).label("answers_yes_percentage")

In this case you can do the calculations both in python or using a query.

  1. Python (this will load all Answers from the database, so not efficient if the data is not yet loaded into memory)

     q = session.query(Question)
     for question in q:
         print(question, question.answers_yes_percentage)
    
  2. Database: this is very efficient because you just run one query, similar to the separate query in the answer you are looking into, but the result is returned separately and as a property on the model

     q = session.query(Question, Question.answers_yes_percentage)
     for question, percentage in q:
         print(question, percentage)
    

Please note that above works with 1.4 version of sqlalchemy, but might need other syntax for prior versions.

van
  • 74,297
  • 13
  • 168
  • 171
  • Thanks @van, it works perfectly! I really appreciated the explanations and the two options to achieve the desired outcome. Just one note: your method performs the "integer division": the fractional part of the numbers is discarded, the number is always rounded down. Is there a way to perform the "float division" instead, obtaining the same number that I obtain using Python `/` operator? – Enrico Gandini Apr 01 '21 at 08:45
  • 1
    Try replacing `100` with `100.1`. If this is not sufficient, i will change the answer with `CAST` operator. – van Apr 01 '21 at 08:48
  • Yeah, It works well! Using `100.0001` changes the correct number by a negligible amount. – Enrico Gandini Apr 01 '21 at 08:52
  • Out of curiosity, I also tried the `cast` function: `from sqlalchemy import cast, Float`. Then, substituting `func.sum(Answer.answer)` with `cast(func.sum(Answer.answer), Float)` returns the exact same float number obtained with basic Python! – Enrico Gandini Apr 01 '21 at 08:54
  • I still have to wait 11 hours to award the bounty, but it will be yours, thank you very much @van! – Enrico Gandini Apr 01 '21 at 08:57
  • 1
    You are most welcome. I will amend the answer with another usage you might find useful, please feel free to explore it further using sqlalchemy documentation. – van Apr 01 '21 at 09:56
  • Thanks @van, I really appreciate the bonus explanation, since I am trying to learn SQLAlchemy and to use all its interesting features! – Enrico Gandini Apr 01 '21 at 10:22