Does a fat table/more columns affect performance in sql

Question

In the data that I have, there are around 1M rows, each with around 60-70 columns. However only few rows(20-30) will have columns beyond 30 filled, i.e, the table is sparse.Also columns beyond 30 are rarely queried.

Does "number of columns" impact performance?

Should I make two tables? one with first 30 columns and the second table is the original table.

or should I keep the original structure?

Table schema :- Table entity_table ( entity_id int, tag_1 text, tag_2 text, . . . tag_30 text, --upto col. 30 table is dense tag_31 text, . . . tag_70 text --sparse columns );

Also, does the type of these columns affect performance. Does postgres index null values, how to prevent that?

Sounds like a rather strange datamodel, but that's just a first impression because you don't show us the model. An empty column (that is NULL) hardly has any impact on performance and a million records is almost nothing for a database. — Frank Heikens, Mar 12 '22 at 12:24
It depends on column type's, total row size and many more. Read through `postgres` documentation. It does have various performance notes. — user14063792468, Mar 12 '22 at 13:43
Using 2 tables (one with 30 columns and one with the rest of them) and joining them when needed will be a better choice. https://dba.stackexchange.com/a/210294 — IVO GELOV, Mar 13 '22 at 17:58

score 1 · Accepted Answer · answered Mar 13 '22 at 18:45

Does "number of columns" impact performance? Short answer is "Yes, but don't worry about it."

More precisely, it eats space and that space has to go to and from disk, eats cache, etc. all of which costs resources. The exact amount of space depends on the column and is available alongside each data type in the postgres docs for data types: https://www.postgresql.org/docs/14/datatype.html

As Frank Heikens commented, a million rows isn't a lot these days. At 70 columns, 8 bytes per column for a million rows you'd be looking at ~560M which will happily fit in memory on a Raspberry PI so shouldn't be that big of a deal.

However, when you get to billions or trillions of rows all those little bytes really start adding up. Hence you might look at:

Splitting up the table - however, if this results in more joins you could find the overall performance gets worse not better
Using smaller column types (e.g. smallint rather than int)
Reordering columns - see Calculating and saving space in PostgreSQL However, I wouldn't recommend this as a starting point - design for readability first, then performance
Columnar storage https://en.wikipedia.org/wiki/Column-oriented_DBMS for which there are some postgres options which I don't have direct experience of but are potentially worth looking at e.g. https://www.buckenhofer.com/2021/01/postgresql-columnar-extension-cstore_fdw/

Does a fat table/more columns affect performance in sql

1 Answers1