1

In the data that I have, there are around 1M rows, each with around 60-70 columns. However only few rows(20-30) will have columns beyond 30 filled, i.e, the table is sparse.Also columns beyond 30 are rarely queried.

Does "number of columns" impact performance?

Should I make two tables? one with first 30 columns and the second table is the original table.

or should I keep the original structure?

Table schema :- Table entity_table ( entity_id int, tag_1 text, tag_2 text, . . . tag_30 text, --upto col. 30 table is dense tag_31 text, . . . tag_70 text --sparse columns );

Also, does the type of these columns affect performance. Does postgres index null values, how to prevent that?

Maniish
  • 49
  • 1
  • 5
  • Sounds like a rather strange datamodel, but that's just a first impression because you don't show us the model. An empty column (that is NULL) hardly has any impact on performance and a million records is almost nothing for a database. – Frank Heikens Mar 12 '22 at 12:24
  • It depends on column type's, total row size and many more. Read through `postgres` documentation. It does have various performance notes. – user14063792468 Mar 12 '22 at 13:43
  • Related: https://stackoverflow.com/a/12147130/939860 – Erwin Brandstetter Mar 12 '22 at 14:39
  • There are five or six questions in here. – David Aldridge Mar 13 '22 at 15:37
  • Using 2 tables (one with 30 columns and one with the rest of them) and joining them when needed will be a better choice. https://dba.stackexchange.com/a/210294 – IVO GELOV Mar 13 '22 at 17:58

1 Answers1

1

Does "number of columns" impact performance? Short answer is "Yes, but don't worry about it."

More precisely, it eats space and that space has to go to and from disk, eats cache, etc. all of which costs resources. The exact amount of space depends on the column and is available alongside each data type in the postgres docs for data types: https://www.postgresql.org/docs/14/datatype.html

As Frank Heikens commented, a million rows isn't a lot these days. At 70 columns, 8 bytes per column for a million rows you'd be looking at ~560M which will happily fit in memory on a Raspberry PI so shouldn't be that big of a deal.

However, when you get to billions or trillions of rows all those little bytes really start adding up. Hence you might look at:

Richard Wheeldon
  • 973
  • 10
  • 25