0

I use postgres and I need some help from you all PG experts... I am looking to track counts from a large set of source tables whose counts keep changing everyday. I want to use the tablename, row count and tablesize in a tracker table, and a column called created_dttm field to show when this row count is recorded from source table. This is for trending how the table counts are changing with time and look for peaks.

insert into tracker_table( tablename, rowcount, tablesize, timestamp)
from 
(
(select schema.tablename ... - not sure how to drive this to pick up a list of tables??
, select count(*) from schema.tablename
, SELECT pg_size_pretty(pg_total_relation_size('"schema"."tablename"'))
, select created_dttm from schema.tablename
)
);

Additionally, I want to get a particular column from source table for a fourth column. This would be a created_dttm timestamp field in the source table, and I want to run a simple sql to get this date to the tracker table. Any suggestions how to attack this problem?

wizengeeky
  • 67
  • 9
  • Is the list of tables you need to examine static, or does your process need to account for new tables coming in/being deleted? – sovemp Mar 24 '17 at 19:33
  • Thanks for your response, the list of tables is static – wizengeeky Mar 24 '17 at 19:36
  • You could simply write the same query for each table. A better way still, though, you would need to use dynamic SQL. What I would do, is make a function that gets passed the table name and dynamically executes the query you need. Then you'd just statically call said function on each table, or alternatively, query nformation_schema.tables (which you would have to do if your list of tables wasn't static. Also, this answer has some discussion of issues that are probably relevant you: http://stackoverflow.com/questions/32210193/run-the-same-query-against-multiple-tables-without-dynamic-sql – sovemp Mar 24 '17 at 19:41
  • Very interesting! I need to dig up the corresponding postgres sql notation. I see what you mean. I could do multiple select for the specific table and get the counts. I am looking for a way to get a loop of a list of tables which will then run the queries to populate to tracker. I dont want to do insert values 'table1', select... then insert 'table2', select.. individually populating the tablename field – wizengeeky Mar 24 '17 at 19:49
  • This kind of has what you mention: http://stackoverflow.com/questions/24618672/function-to-loop-through-and-select-data-from-multiple-tables Keep in mind, you can only do dynamic sql in a PG/PGSQL function in postgres. – sovemp Mar 24 '17 at 20:37
  • sovemp, thanks, but this does not work for my need right now. Good to learn about it :) – wizengeeky Mar 28 '17 at 18:48

1 Answers1

1

before reading the code please consider this:

  1. instead of selecting several subqueries, this if you can join them into one qry, eg select (select 1 from t), (select 2 from t) can be refactored to select 1,2 from t
  2. pg_total_relation_size is sum of data pages, so it is size of table, but not size of data in it.
  3. you need aggregation on your created_dttm column (I used oid instead), otherwise your subquery returns more then one row, so you won't be able to insert the result.
  4. instead of select count(*) maybe use pg_stat_all_tables stats?.. counting can be very expensive and acuracy of the count() is neglected by the fact that next minute same select count() will be different and you probably wont run this count every two seconds...

code:

t=# create table so30 (n text, c int, s text, o int);
CREATE TABLE

t=# do
$$
declare
  _r record;
  _s text;
begin
  for _r in (values('pg_database'),('pg_roles')) loop
    _s := format('select %1$L,(select count(*) from %1$I), (SELECT pg_size_pretty(pg_total_relation_size(%1$L))), (select max(oid) from %1$I)',_r.column1);
    execute format('insert into so30 %s',_s);
  end loop;
end;
$$
;
DO
t=# select * from so30;
      n      | c |    s    |   o
-------------+---+---------+-------
 pg_database | 4 | 72 kB   | 16384
 pg_roles    | 2 | 0 bytes |  4200
(2 rows)
Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
  • Thanks, you have raised very interesting points. I think the looping structure comes close to what I will need, although your answer does not fully answer my question. I need to find a count for the tables by a timestamp. A table will have N timestamps, I want a count for all those timestamp, along with the larger nested loop for all the tables M similarly. – wizengeeky Mar 28 '17 at 18:47