Postgres Accurate Column Disk Usage Percentage of Table

Question

I am trying to figure out an accurate representation of what percentage of a table's disk usage a certain column makes up. If you use a combinations of solutions from PostgreSQL: Column Disk usage and How do you find the disk size of a Postgres / PostgreSQL table and its indexes, you'll get a solution such as:

select sum(pg_column_size('column_1'))      as column_1,
       sum(pg_column_size('column_2'))      as column_2,
       pg_total_relation_size('table_name') as total_size
from table_name;

However, there are some peculiarities with the output I can't reconcile:

The output of pg_total_relation_size is way larger than the sum of all of the individual columns (i.e. sum(pg_column_size('column_1')) + sum(pg_column_size('column_2')) ... Even if I subtract the size of indexes on the table and size of TOAST, the values are very different. For example, the sum of all of my column sizes is equal to 1217720 bytes, but according to PG the size of the table (excluding indexes and toast) is 6963200. Where can that large of a discrepancy be coming from?
If I run the query above against two entirely different servers, and then compute my percentages as sum(pg_column_size('column_1')) / (sum(pg_column_size('column_1')) + sum(pg_column_size('column_2')) the percentages are identical, despite wildly different data sets of strings, jsonb, etc.

I feel like I am missing something obvious but I can't figure it out. Any ideas?

It seems like using `sum(octet_length(column_1::text))` for text/varchar/json/jsonb/tsvector columns gives me a more accurate representation of what I'm looking for — aarbor, Feb 24 '22 at 21:19
pg_column_size('column_1') gives you the size of the string 'column_1' (including the size marker) — jjanes, Feb 24 '22 at 22:35

Postgres Accurate Column Disk Usage Percentage of Table

0 Answers0