How to measure table space on disk in RedShift / ParAccel

Question

I have a table in RedShift. How can I see how many disk-space it uses?

hi diemacht, please check my answer, there is an error on the accepted answer if your block size configuration is not the default one. maybe this is affecting you without knowing. — Diego, Jul 30 '15 at 09:56

score 49 · Accepted Answer · answered Oct 22 '13 at 14:22

Use queries from this presentation: http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices

Analyze disk space usage for cluster:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes,
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(*) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

Analyze Table distribution between nodes:

select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name = '__INSERT__TABLE__NAME__HERE__' and col = 0
order by slice, col;

hey, check my comment, there is a potential mistake on your query. — Diego, Jul 30 '15 at 09:51

score 14 · Answer 2 · answered Jul 30 '15 at 09:50

I know this question is old and already has an answer accepted but I must point out that the answer is wrong. What the query is outputting there as "mb" is actually the "number of blocks". The answer would be correct only if the block size is 1MB (which is the default).

If the block size is different (in my case for example is 256K), you have to multiply the number of blocks by its size in bytes. I suggest the following change to your query where I multiply the number of blocks by the block size in bytes (262144 bytes) and then divide by (1024*1024) to output the total in megabytes:

select
    trim(pgdb.datname) as Database,
    trim(pgn.nspname) as Schema,
    trim(a.name) as Table,
    b.mbytes as previous_wrong_value,
    (b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes", 
    a.rows
from (
    select db_id, id, name, sum(rows) as rows
    from stv_tbl_perm a
    group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
    select tbl, count(blocknum) as mbytes
    from stv_blocklist
    group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

Is it possible to change the block size in redshift? I've been searching for information about this for a while, and not found any way to do so. — Thomas Andrews, Nov 05 '15 at 19:28
I believe you can. On former Paraccel (actual Actian Matrix - redshift's predecessor), you can control that by changing the value of block_size in the padb.conf. On redshift should be something on the same line — Diego, Nov 10 '15 at 13:00

score 1 · Answer 3 · answered Jan 30 '14 at 10:15

Adding owner and a schema filter to the above query:

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 b.mbytes,
 a.rows
from 
 (select 
   db_id,
   id, 
   name,
   sum(rows) as rows
  from stv_tbl_perm a
  group by db_id, id, name
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    count as mbytes
   from stv_blocklist
   group by tbl
 ) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

count should be count(*) – Dwayne Driskill Dec 14 '17 at 14:17 — Dwayne Driskill, Dec 14 '17 at 14:17

score 0 · Answer 4 · answered May 06 '15 at 17:03

Just thought I would expand on this as I am facing an issue of uneven distribution. I've added some links and fields to enable analysis of space by node and slice. Also added are max/min values and number of values per slice for column 0.

select
 cast(use.usename as varchar(50)) as owner, 
 trim(pgdb.datname) as Database,
 trim(pgn.nspname) as Schema,
 trim(a.name) as Table,
 a.node,
 a.slice,
 b.mbytes,
 a.rows,
 a.num_values,
 a.minvalue,
 a.maxvalue
from 
 (select 
   a.db_id,
   a.id, 
   s.node,
   s.slice,
   a.name,
   d.num_values,
   d.minvalue,
   d.maxvalue,
   sum(rows) as rows
  from stv_tbl_perm a
  inner join stv_slices s on a.slice = s.slice
  inner join (
    select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue
    from svv_diskusage
    where col = 0
    group by 1, 2) d on a.id = d.tbl and a.slice = d.slice
  group by 1, 2, 3, 4, 5, 6, 7, 8
 ) as a
 join pg_class as pgc on pgc.oid = a.id
 left join pg_user use on (pgc.relowner = use.usesysid)
 join pg_namespace as pgn on pgn.oid = pgc.relnamespace 
   -- leave out system schemas
   and pgn.nspowner > 1
 join pg_database as pgdb on pgdb.oid = a.db_id
 join 
  (select 
    tbl,
    slice,
    count(*) as mbytes
   from stv_blocklist
   group by tbl, slice
 ) b on a.id = b.tbl
    and a.slice = b.slice
order by mbytes desc, a.db_id, a.name, a.node;

How to measure table space on disk in RedShift / ParAccel

4 Answers4