3

We are going to be using one of the system tables in PostgreSQL to get row count estimates for several tables. I understand that for these estimates to be better, statistics need to be up to date; the wiki here states that one should make sure "you have been running ANALYZE on the table enough to keep these statistics up to date".

We expect certain tables to eventually be written to or updated quite frequently (say, around a hundred and fifty times a second is my ballpark - this is "quite frequent" to me, but I'm not sure how that qualifies in real-life DB's). Counts should be happening about once every second, and I would say that it would be necessary for them to return a value that reflects the number of rows that changed in the table with some level of accuracy (i.e. it would be strange if the number did not change after a couple of seconds, if there were many inserts over that period).

What is a good value for "enough"? Is there some way to automate the running of ANALYZE on the tables? If more information is needed, please say so and I'll edit the post ASAP.

Thank you very much!

Community
  • 1
  • 1
Juan Carlos Coto
  • 11,900
  • 22
  • 62
  • 102
  • You may want to explain a bit more about what your usage going to be. How often are the tables going to be modified (particularly INSERT/DELETE operations)? The answer to your question is largely going to depend on how often rows will be inserted or deleted as well as how accurate you need those stats to be. – jcern Jan 29 '13 at 14:46
  • @Will - Not sure how this is off topic? A comment would be helpful to the person asking the question and to other people. – Chris Dutrow Feb 23 '13 at 20:13
  • @chris this isn't about managing an RDBMS? –  Feb 23 '13 at 20:26
  • @Will - At least part of it is about DBA admin. Some of it may require code, I wouldn't know for sure. If you think the question should be in "http://dba.stackexchange.com/", probably leave a comment saying so. When you have the power to bypass the voting mechanism and close the question in a few seconds as opposed to the 15-20 minutes someone may have taken to write it, probably best to exercising discretion in making that decision. If you do decide to close it, probably best to point the person who worked hard on the question in the right direction via a comment. – Chris Dutrow Feb 24 '13 at 05:04
  • @chris well put. We can use you over on [meta]. Unfortunately, we have MANY people who don't give... er, any care at all about the community or its standards (which are 2/3rds of why SO has been successful), so we have to be a bit proactive. The OP has his answer, and the question isn't deleted; closure is more of a reminder that we are trying to keep focus on answerable programming questions here. It is in the grey area, no doubt, and if it gets reopened I'm fine leaving it that way. Its not perfect, sorry. –  Feb 24 '13 at 07:34
  • @Will - Its all good. Sounds like a solid argument could be made for closure. The advantage of leaving a quick comment saying something like: "Hey, probably better to post this in "dba.StackExchange.com", is that instead of creating a frustrating dead end for the user, you've pointed them to a better resource and educated them about how to use the forum. – Chris Dutrow Feb 24 '13 at 17:34

1 Answers1

2

AutoVacuum should be able to handle what you are looking to do. If it is running, you can adjust the parameter autovacuum_analyze_threshold (integer) in postgresql.conf to meet your needs.

Per the documentation:

autovacuum_analyze_threshold (integer)

Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples. This parameter can only be set in the postgresql.conf file or on the server command line. This setting can be overridden for individual tables by changing storage parameters.

Keep in mind that the number is not a guarantee of how often it will run, just a minimum threshold. As with most configuration options, it is a good idea to do some tests with several values to get the best trade-off between accuracy and performance and make sure it is meeting your needs.

jcern
  • 7,798
  • 4
  • 39
  • 47
  • 2
    It works in conjunction with `autovacuum_analyze_scale_factor` so it might probably be a good idea to adjust both values: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-ANALYZE-SCALE-FACTOR –  Jan 29 '13 at 15:11