I am trying to figure out an accurate representation of what percentage of a table's disk usage a certain column makes up. If you use a combinations of solutions from PostgreSQL: Column Disk usage and How do you find the disk size of a Postgres / PostgreSQL table and its indexes, you'll get a solution such as:
select sum(pg_column_size('column_1')) as column_1,
sum(pg_column_size('column_2')) as column_2,
pg_total_relation_size('table_name') as total_size
from table_name;
However, there are some peculiarities with the output I can't reconcile:
The output of
pg_total_relation_size
is way larger than the sum of all of the individual columns (i.e.sum(pg_column_size('column_1')) + sum(pg_column_size('column_2'))
... Even if I subtract the size of indexes on the table and size of TOAST, the values are very different. For example, the sum of all of my column sizes is equal to1217720
bytes, but according to PG the size of the table (excluding indexes and toast) is6963200
. Where can that large of a discrepancy be coming from?If I run the query above against two entirely different servers, and then compute my percentages as
sum(pg_column_size('column_1')) / (sum(pg_column_size('column_1')) + sum(pg_column_size('column_2'))
the percentages are identical, despite wildly different data sets of strings, jsonb, etc.
I feel like I am missing something obvious but I can't figure it out. Any ideas?