I have a table in RedShift. How can I see how many disk-space it uses?

- 60,241
- 22
- 105
- 110

- 2,022
- 7
- 30
- 44
-
1hi diemacht, please check my answer, there is an error on the accepted answer if your block size configuration is not the default one. maybe this is affecting you without knowing. – Diego Jul 30 '15 at 09:56
4 Answers
Use queries from this presentation: http://www.slideshare.net/AmazonWebServices/amazon-redshift-best-practices
Analyze disk space usage for cluster:
select
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(*) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
Analyze Table distribution between nodes:
select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name = '__INSERT__TABLE__NAME__HERE__' and col = 0
order by slice, col;

- 8,487
- 3
- 42
- 44
I know this question is old and already has an answer accepted but I must point out that the answer is wrong. What the query is outputting there as "mb" is actually the "number of blocks". The answer would be correct only if the block size is 1MB (which is the default).
If the block size is different (in my case for example is 256K), you have to multiply the number of blocks by its size in bytes. I suggest the following change to your query where I multiply the number of blocks by the block size in bytes (262144 bytes) and then divide by (1024*1024) to output the total in megabytes:
select
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes as previous_wrong_value,
(b.mbytes * 262144)::bigint/(1024*1024) as "Total MBytes",
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(blocknum) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

- 34,802
- 21
- 91
- 134
-
Is it possible to change the block size in redshift? I've been searching for information about this for a while, and not found any way to do so. – Thomas Andrews Nov 05 '15 at 19:28
-
I believe you can. On former Paraccel (actual Actian Matrix - redshift's predecessor), you can control that by changing the value of block_size in the padb.conf. On redshift should be something on the same line – Diego Nov 10 '15 at 13:00
Adding owner and a schema filter to the above query:
select
cast(use.usename as varchar(50)) as owner,
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from
(select
db_id,
id,
name,
sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
left join pg_user use on (pgc.relowner = use.usesysid)
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
-- leave out system schemas
and pgn.nspowner > 1
join pg_database as pgdb on pgdb.oid = a.db_id
join
(select
tbl,
count as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

- 2,828
- 16
- 16
Just thought I would expand on this as I am facing an issue of uneven distribution. I've added some links and fields to enable analysis of space by node and slice. Also added are max/min values and number of values per slice for column 0.
select
cast(use.usename as varchar(50)) as owner,
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
a.node,
a.slice,
b.mbytes,
a.rows,
a.num_values,
a.minvalue,
a.maxvalue
from
(select
a.db_id,
a.id,
s.node,
s.slice,
a.name,
d.num_values,
d.minvalue,
d.maxvalue,
sum(rows) as rows
from stv_tbl_perm a
inner join stv_slices s on a.slice = s.slice
inner join (
select tbl, slice, sum(num_values) as num_values, min(minvalue) as minvalue, max(maxvalue) as maxvalue
from svv_diskusage
where col = 0
group by 1, 2) d on a.id = d.tbl and a.slice = d.slice
group by 1, 2, 3, 4, 5, 6, 7, 8
) as a
join pg_class as pgc on pgc.oid = a.id
left join pg_user use on (pgc.relowner = use.usesysid)
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
-- leave out system schemas
and pgn.nspowner > 1
join pg_database as pgdb on pgdb.oid = a.db_id
join
(select
tbl,
slice,
count(*) as mbytes
from stv_blocklist
group by tbl, slice
) b on a.id = b.tbl
and a.slice = b.slice
order by mbytes desc, a.db_id, a.name, a.node;