0

From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?

Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.

Thank you.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Ruffy
  • 835
  • 1
  • 10
  • 17
  • maybe you should clarify your intent and your thinking. Is it as simple as "Why do we always use COUNT(*) in SQL?" for example – gbn Jul 31 '11 at 23:58

4 Answers4

4

To get the number of items (rows in a table), you'd use standard SQL and do it on demand

SELECT COUNT(*) FROM MyTable

Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.

This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.

Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Since you still have to do a DB query to get the value, might as well get the value at once. Then again, counting rows is probably one of the simplest queries out there, and it's bound to be fast(er than getting a straight value). – Christian Jul 31 '11 at 23:51
  • 1
    @Christian Sciberras: The trade off for not using COUNT is the concurrency issues of maintaining the count separately... – gbn Jul 31 '11 at 23:55
  • I try to avoid the wildcard * as much as possible, especially when a function is evaluating it as well. If the data set is GIANT, it will be significantly slower to pull in all of the data, as opposed to the one unique (and most important) identifier that is common in the data design. – Blake Jul 31 '11 at 23:57
  • @Blake: `SELECT COUNT(*)` *is* an optimisation. It isn't `SELECT *`. See these please http://stackoverflow.com/questions/1747990/performance-of-count-sql-function/1748028#1748028 and http://stackoverflow.com/questions/3003457/count-vs-countcolumn-name-which-is-more-correct/3003533#3003533 and (most complete) http://dba.stackexchange.com/questions/2511/what-is-the-difference-between-select-count-and-select-countany-non-null-col/2512#2512 – gbn Jul 31 '11 at 23:59
  • Maintaining a row count separately is not guaranteed to end in disaster. [Some platforms and usage modes](http://code.google.com/appengine/articles/sharding_counters.html) require that you do exactly that. When trying to count the number of rows in the result-set of a complex query, using `COUNT()` will be nearly as expensive as actually executing the query. For the sake of performance and scalability it can be necessary to maintain separate counters to handle such cases. – aroth Aug 01 '11 at 00:00
  • @gbn I refer to it in this specific instance as well as in my MySQL queries in general. If he wants to count two unique sub-forums, ie Colors => [Red, Blue] he could do so with one query. `count(red)` as well as `count(blue)`. **shrug** I just try to plan ahead for flexibility. – Blake Aug 01 '11 at 00:02
  • @aroth: If you have big enough data and queries that `COUNT(*)` is not good enough, concurrency of maintaining a separate column *will* be an issue. For 99.99+% of databases, `COUNT(*)` *is* the best way – gbn Aug 01 '11 at 00:03
  • @gbn - Yes, it's not a common case, but it does come up. And a bit more often than 0.01% of the time, I would say. My point was just that you made it sound like there are no valid use-cases for maintaining separate row counts and no way to do it correctly, which is not really the case. There are times when it's a necessary evil, and while it requires you to think about synchronization issues it can certainly be done without too much drama. – aroth Aug 01 '11 at 00:12
  • @aroth: Your link states "about five times a second". That's trivial. I work on systems with 50,000 new rows per second: that's what I mean about concurrency. My answer is for mainstream RDBMS that is expected to scale to high write volumes. The App Engine docs also state "The App Engine datastore is not like a traditional relational database." and it also looks like an OODBMS. The question is tagged MySQL/SQL. Your comments are perhaps irrelevant... – gbn Aug 01 '11 at 00:16
  • 1
    @gbn Replying to your comment about mine. The trade-of isn't just that, but it affects performance. Whereas a write is done once with an `INSERT `, it's done twice in an `INSERT ; UPDATE `. The point is that implementation of manual increment will probably end up being slightly slower than just doing a COUNT() query. – Christian Aug 01 '11 at 06:34
1

I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.

I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.

Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.

duedl0r
  • 9,289
  • 3
  • 30
  • 45
0

In my forum I count the sub-threads in a forum like this:

SELECT COUNT(forumid) AS count FROM forumtable

As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.

Blake
  • 2,294
  • 1
  • 16
  • 24
  • I think he wants to count elements in a table..auto_increment is not good for that. – duedl0r Jul 31 '11 at 23:46
  • Yes, I was comparing counting the number of items in a table to storing the number as a field, such as counting the number of topics created in a forum in that forum category's table as a field rather than recounting it every time you look at the statistics. – Ruffy Jul 31 '11 at 23:49
  • Oh, it seemed to me like he was trying to count the rows in a table in order to provide an id to add one more. Let me adjust my answer then. – Blake Jul 31 '11 at 23:49
0

I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.

Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items

Flambino
  • 18,507
  • 2
  • 39
  • 58