Since you did not specify the index type, I'll assume default B-tree indexes. Other types can be a lot different.
Here is a simplistic function to compute the estimated minimum size in bytes for an index on the given table with the given columns:
CREATE OR REPLACE FUNCTION f_index_minimum_size(_tbl regclass, _cols VARIADIC text[], OUT estimated_minimum_size bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_missing_column text;
BEGIN
-- assert
SELECT i.attname
FROM unnest(_cols) AS i(attname)
LEFT JOIN pg_catalog.pg_attribute a ON a.attname = i.attname
AND a.attrelid = _tbl
WHERE a.attname IS NULL
INTO _missing_column;
IF FOUND THEN
RAISE EXCEPTION 'Table % has no column named %', _tbl, quote_ident(_missing_column);
END IF;
SELECT INTO estimated_minimum_size
COALESCE(1 + ceil(reltuples/trunc((blocksize-page_overhead)/(4+tuple_size)))::int, 0) * blocksize -- AS estimated_minimum_size
FROM (
SELECT maxalign, blocksize, reltuples, fillfactor, page_overhead
, (maxalign -- up to 16 columns, else nullbitmap may force another maxalign step
+ CASE WHEN datawidth <= maxalign THEN maxalign
WHEN datawidth%maxalign = 0 THEN datawidth
ELSE (datawidth + maxalign) - datawidth%maxalign END -- add padding to the data to align on MAXALIGN
) AS tuple_size
FROM (
SELECT c.reltuples, count(*)
, 90 AS fillfactor
, current_setting('block_size')::bigint AS blocksize
, CASE WHEN version() ~ '64-bit|x86_64|ppc64|ia64|amd64|mingw32' -- MAXALIGN: 4 on 32bits, 8 on 64bits
THEN 8 ELSE 4 END AS maxalign
, 40 AS page_overhead -- 24 bytes page header + 16 bytes "special space"
-- avg data width without null values
, sum(ceil((1-COALESCE(s.null_frac, 0)) * COALESCE(s.avg_width, 1024))::int) AS datawidth -- ceil() because avg width has a low bias
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_attribute a ON a.attrelid = c.oid
JOIN pg_catalog.pg_stats s ON s.schemaname = c.relnamespace::regnamespace::text
AND s.tablename = c.relname
AND s.attname = a.attname
WHERE c.oid = _tbl
AND a.attname = ANY(_cols) -- all exist, verified above
GROUP BY 1
) sub1
) sub2;
END
$func$;
Call examples:
SELECT f_index_minimum_size('my_table', 'col1', 'col2', 'col3');
SELECT f_index_minimum_size('public.my_table', VARIADIC '{col1, col2, col3}');
db<>fiddle here
About VARIADIC
parameters:
Basically, all indexes use data pages of typically 8 kb block size (rarely 4 kb). There is one data page overhead for B-tree indexes to start with. Each additional data page has a fixed overhead of 40 bytes (currently). Each page stores tuples like depicted in the manual here. Each tuple has a tuple header (typically 8 bytes incl. alignment padding), possibly a null bitmap, data (possibly incl. alignment padding between columns for multicolumn indices), and possibly alignment padding to the next multiple of MAXALIGN
(typically 8 bytes). Plus, there is an ItemId
of 4 bytes per tuple. Some space may be reserved initially for later additions with a fillfactor
- 90 % by default for B-tree indexes.
Important notes & disclaimers
The reported size is the estimated minimum size. An actual index will typically be bigger by around 25 % due to natural bloat from page splits. Plus, the calculation does not take possible alignment padding between multiple columns into account. Can add another couple percent (or more in extreme cases). See:
Estimations are based on column statistics in the view pg_stats
which is based on the system table pg_statistics
. (Using the latter directly would be faster, but only allowed for superusers.) In particular, the calculation is based on null_frac
, the "fraction of column entries that are null" and avg_width
, the "average width in bytes of column's entries" to compute an average data width - ignoring possible additional alignment padding for multicolumn indexes.
The default 90 % fillfactor
is taken into account. (One might specify a different one.)
Up to 50 % bloat is typically natural for B-tree indexes and nothing to worry about.
Does not work for expression indexes.
No provision for partial indexes.
Function raises an exception if anything but existing plain column names is passed. Case-sensitive!
If the table is new (or in any case if statistics may be out of date), be sure to run ANALYZE
on the table before calling the function to update (or even initiate!) statistics.
Due to major optimizations, B-tree indexes in Postgres 12 waste less space and are typically closer to the reported minimum size.
Does not account for deduplication that's introduced with Postgres 13, which can compact indexes with duplicate values.
Parts of the code are taken from ioguix' bloat estimation queries here:
More gory details i the Postgres source code here: