1

I'm building a Reddit clone that allows users to create posts, vote on other people's posts, and leave comments on other people's posts. When I query the database for a list of all of the posts, I would like my query to return data like the example table below.

Anticipated Table

post_id Number of Upvotes Number of Downvotes Number of Comments
1 2 0 5
2 1 0 0
3 0 1 0
4 0 1 4

Unfortunately, when I query the database, the result is unexpected.

Unexpected Table

post_id Number of Upvotes Number of Downvotes Number of Comments
1 10 0 5
2 1 0 0
3 0 1 0
4 0 4 4

The number of upvotes becomes equal to the number of comments on a post multiplied by the number of upvotes or downvotes. Below is the code.

SQL Alchemy Query and SQL Version

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes"),
      func.count(models.Comment.post_id).label('num_comments')
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).join(
      models.Comment, models.Comment.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes,
count(comments.post_id) AS num_comments 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
LEFT OUTER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id

SQL Alchemy Models

class Post(Base):
  __tablename__ = "posts"

  id = Column(Integer, primary_key=True, nullable=False)
  title = Column(String, nullable=False)
  content = Column(String, nullable=False)
  published = Column(Boolean, server_default="True", nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)
  owner_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"), nullable=False)
  owner = relationship("User", backref="posts")


class User(Base): 
  __tablename__ = "users"

  id = Column(Integer, primary_key=True, nullable=False)
  username = Column(String, nullable=False, unique=True)
  email = Column(String, nullable=False, unique=True)
  password = Column(String, nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)


class Vote(Base):
  __tablename__ = "votes"

  user_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"), primary_key=True)
  user = relationship("User", backref="votes")
  post_id = Column(Integer, ForeignKey("posts.id", 
    ondelete="CASCADE"), primary_key=True)
  post = relationship("Post", backref="votes")
  upvote = Column(Boolean, nullable=False)

class Comment(Base):
  __tablename__ = "comments"

  id = Column(Integer, primary_key=True, nullable=False)
  owner_id = Column(Integer, ForeignKey("users.id", 
    ondelete="CASCADE"))
  owner = relationship("User", backref="comments")
  post_id = Column(Integer, ForeignKey("posts.id", 
    ondelete="CASCADE"))
  post = relationship("Post", backref="comments")
  content = Column(String, nullable=False)
  created_at = Column(TIMESTAMP(timezone=True), 
    server_default=text('now()'), nullable=False)

First Query

I've tried removing the second Left Join on the comments table and the query that attempts to count the number of comments. The following query makes the number of upvotes and downvotes act correctly.

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes"),
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes, 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
post_id Number of Upvotes Number of Downvotes
1 2 0
2 1 0
3 0 1
4 0 1

Second Query

I've also tried only removing the function that counts the number of comments while leaving the second LEFT OUTER JOIN in place.

posts_query = db.query(
      models.Post.id,
      func.count(models.Vote.post_id).filter(models.Vote.upvote == True).label("num_upvotes"),
      func.count(models.Vote.post_id).filter(models.Vote.upvote == False).label("num_downvotes")
    ).join(
      models.Vote, models.Vote.post_id == models.Post.id, isouter=True
    ).join(
      models.Comment, models.Comment.post_id == models.Post.id, isouter=True
    ).group_by(
      models.Post.id
    ).all()
SELECT posts.id AS posts_id,
count(votes.post_id) FILTER (WHERE votes.upvote = true) AS num_upvotes,
count(votes.post_id) FILTER (WHERE votes.upvote = false) AS num_downvotes 
FROM posts
LEFT OUTER JOIN votes ON votes.post_id = posts.id
LEFT OUTER JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id
post_id Number of Upvotes Number of Downvotes
1 10 0
2 1 0
3 0 1
4 0 4

This leads me to believe the issue is with the second LEFT OUTER JOIN rather than the other counting function.

When I attempt to add the second LEFT OUTER JOIN to my query, the count that keeps track of the number of upvotes and downvotes returns an incorrect value. Why is my query giving me the Unexpected Table above and what I would need to change to return the Anticipated Table instead?

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • Please ask 1 specific researched non-duplicate question. Please either ask about 1 bad definition/query/function with the obligatory [mre] & why you think it should return something else at the 1st subexpression that it doesn't give what you expect, justified by reference to authoritative documentation, or ask about your overall goal giving working parts you can do & ideally a [mre]. But please ask about bad code 1st because misconceptions get in the way of your goal. And bad code doesn't tell us what you wish it would do. [ask] [Help] – philipxy Dec 12 '22 at 05:55
  • Debug questions require a [mre]--cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. Please debug one framework level at a time. – philipxy Dec 12 '22 at 06:01
  • Please before considering posting: Pin down code issues via [mre]. Read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect research in posts. SO/SE search is poor & literal & unusual, read the help. Google re googling/searching, including Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) Questions from this post are faqs. – philipxy Dec 12 '22 at 06:04
  • [Two SQL LEFT JOINS produce incorrect result](https://stackoverflow.com/q/12464037/3404097) – philipxy Dec 12 '22 at 20:02

1 Answers1

2

I don't know Alchemy, so here is my answer on SQL:

Your mistake is that you cross join a post's votes with its comments. If you have two votes and three comments on a post, your joins create all six combinations (2 x 3 = 6). Then you count, but as you count in the cross joined result, you are counting votes and comments multifold.

What you want to do instead is join vote counts and comment counts to the posts:

SELECT 
  p.id AS posts_id,
  COALESCE(v.cnt_up, 0) AS num_upvotes,
  COALESCE(v.cnt_down, 0) AS num_downvotes,
  COALESCE(c.cnt, 0) AS num_comments 
FROM posts p
LEFT OUTER JOIN
(
  SELECT
    post_id,
    COUNT(*) FILTER (WHERE upvote = true) AS cnt_up,
    COUNT(*) FILTER (WHERE upvote = false) AS cnt_down
  FROM votes
  GROUP BY post_id
) v ON v.post_id = p.id
LEFT OUTER JOIN
(
  SELECT post_id, COUNT(*) AS cnt
  FROM comments
  GROUP BY post_id
) c ON c.post_id = p.id
ORDER BY p.id;

Well, I probably shouldn't mention this, as it is considered bad practise, due to creating an unnecessarily large intermediate result and being prone to errors, when trying to add other aggregation results, but well, here it is: As long as the only aggregation function you use is COUNT, you can use your query with distinct ID counts:

SELECT
  p.id AS posts_id,
  COUNT(DISTINCT v.id) FILTER (WHERE v.upvote = true) AS num_upvotes,
  COUNT(DISTINCT v.id) FILTER (WHERE v.upvote = false) AS num_downvotes,
  COUNT(DISTINCT c.id) AS num_comments 
FROM posts p 
LEFT OUTER JOIN votes v ON voves.post_id = p.id
LEFT OUTER JOIN comments c ON c.post_id = p.id
GROUP BY p.id
ORDER BY p.id;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73