3

I want to store a set of hourly set points. 24 floats (0. to 1.) that map hours of the day to an intensity value.

I could:

  1. have 24 columns named _0 to _23. (How can I extend db.Model in Sqlalchemy to use numbers as column names?)
  2. have a values table (id, value, hour_of_the_day).
  3. use postgres array type. (But it seems like I can't have fixed length arrays and I will need some application logic to get/set values.)

The values will mostly be queried together and scaled (so that the range 0 to 1 maps) a custom range.

Which model is preferable?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Alain1405
  • 2,259
  • 2
  • 21
  • 40

1 Answers1

6

It very much depends on the use case and requirements.

1.

24 individual float columns, possibly nullable (your first option) are the optimum for storage size (and hence speed). 8 bytes per float (defaults to float8 a.k.a. double precision) or 4 bytes per real (float4) and no overhead. Since there are exactly 24 given columns, you'll never run into design problems, like you would with a variable number.

A legal identifier in SQL can't start with a number. You would have to always double-quote. Error-prone and confusing. Use a character prefix, like h0 .. h23.

2.

A one-to-many table with a FK constraint to id ot the main table. You would make that (id int, hour_of_the_day int, value float8), not (id, value, hour_of_the_day), no space wasted for alignment padding. Add a CHECK constraint to only allow values 0 - 23 for hour_of_the_day.

This multiplies storage size as compared to 1.: each separate row occupies around 44 bytes. 28 bytes of overhead per row + 16 bytes for the data. You at least add a PK constraint on (id, hour_of_the_day) which is implemented using a btree index (similar page and row structure as tables). This adds another 36 bytes per row. So we end up with at least 80 bytes per value (plus overhead per data page and table). Roughly 10x the storage size of 1..

3.

Arrays? Don't. Storage is more expensive than 1. (array overhead of 24 bytes), handling is least convenient, enforcing integrity (max. / exact length) is not as simple.

Some more pros and cons

With 2. you need a join of two tables to read data, which is more expensive. But there are various advantages, too. If you often need to read or write the main table without columns h0 - h23, you get that a bit cheaper. Updating individual values may be cheaper, since the main table remains unchanged. OTOH, inserting or updating all 24 values of the same id at once is typically more expensive (1 new row version vs. 24 new row versions). Deleting individual values is cheaper for 2..

The values will mostly be queried together ...

That's a lot cheaper with 1..

... and scaled

That may be simpler with 2.: applying the same operation to one column vs. 24 different columns.

Etc. As I said, it very much depends on the use case and requirements.

Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228