0

I'm interested in finding the find the most efficient way to select records from a table based on a boolean attribute.

My scenario is I have a table that holds comments made by users, I want to select all the comments that are flagged as inappropriate.

The two ideas I had to implement this are:

  1. The most logical one, have a boolean attribute 'isFlagged' in the comments table

    The table would have a primary index

    I would execute:

    SELECT description FROM Comment 
    WHERE isFlagged = 1;
    
  2. Have a table that holds the primary keys of records from the Comment table that have been flagged

    Both tables would have a primary index

    I would execute:

    SELECT description FROM Comment 
    WHERE commentID IN (SELECT FK_commentID FROM FlaggedComment)
    

Which would execute the most efficiently?

Is there a better way to do this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
nato
  • 1

2 Answers2

1

I'd go for the first one if and only if you have an index on the isFlagged field:

alter table comment add index idxFlagged (isFlagged)

Additionally, take into account that your second query is just a join (there is no need of a subquery):

select description from Comment c
join FlaggedComment fc on c.commentId = fc.fkCommentId

I bet it should take longer to execute than traversing the index on the isFlagged field. Anyway, I would suggest you to benchmark those two solutions and report back the results :)

Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • The amount of time longer it will take, if any, is likely *inconsequential* more for correct indices as it will be a simple hash-match (or as the query planner decides). RDBMS' are designed for this sort of thing. –  Apr 13 '12 at 01:58
  • Personally, I would have it as a column in the comments table as "it's an attribute of a comment" as it is written now. However, a more complicated flagging system and... –  Apr 13 '12 at 02:23
  • Odds are good that PostgreSQL won't use any kind of index on a Boolean column. You can test it with `EXPLAIN ANALYZE `. – Mike Sherrill 'Cat Recall' Apr 13 '12 at 11:42
0

If performance is your concern, I would recommend you test both scenarios, with representative amounts of (test) data.

For what it's worth - most database systems don't benefit from indices on fields with low cardinality - especially booleans. For a huge table, option 1 would require a table scan, which could be slow.

Option 2 should be faster, because you can index the commentID field - but only for the query you mention. Trying to find details about an individual comment would require a join, which might be slower.

In practice, unless you're working at extremes, it's unlikely that you'd have a measurable performance impact. That's why you should build a test rig, and try it out - the theoretical differences may have no real-world impact.

Community
  • 1
  • 1
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52