0

My data is similar to currency in many aspects so I will use it for demonstration.

I have 10-15 different groups of data, we can say different currencies like Dollar or Euro.

They need to have these columns:

timestamp INT PRIMARY KEY
value INT

Each of them will have more than 1 billion rows and i will append new rows as time passes.

I will just select them in some intervals and create graphs. Probably multiple currency in same graph.

Question is should I add a group column and store all in one table or leave it separately. If they are in same column timestamp will not be unique anymore and probably I should use advanced SQL techniques to make it efficient.

maydin
  • 195
  • 1
  • 3
  • 15
  • 1
    Can you elaborate on what "If they are in same column timestamp will not be unique" means and what the problem was if the timestamps weren't unique? – sticky bit Apr 11 '19 at 21:13
  • There will be multiple rows with same timestamp. In the same timestamp there will be more than 1 rows with different currencies. – maydin Apr 11 '19 at 22:23

1 Answers1

0

10 - 15 "currencies"? 1 billion rows each? Consider list partitioning in Postgres 11 or later. This way, the timestamp column stays unique per partition. (Although I am not sure why that is a necessity.)

Or simply have 10 - 15 separate tables without storing the "currency" redundantly per row. Size matters with this many rows.

Or, if you typically have multiple values (one for each "currency") for the same timestamp, you might use a single table with 10-15 dedicated "currency" columns. Much smaller overall, as it saves the tuple overhead for each "currency" (28 bytes per row or more). See:

The practicality of a single row for multiple "currencies" depends on detailed specs. For example: might not work so well for many updates on individual values.

You added:

I have read clustered indexes which orders data in physical order in disk. I will not insert new rows in middle of table

That seems like a perfect use case for BRIN indexes, which are dramatically smaller than their B-tree relatives. Typically a bit slower, but with your setup maybe even faster. Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Uniqueness is not necessity but if there are more than 1 rows per timestamp it might be not efficient for indexing etc. – maydin Apr 12 '19 at 16:57
  • I made some tests and list partitioning fits really well in my situation. Adding "type" column for partition increases table size by 17% but index on timestamp stays same where it was 50% for 30 million row. (1333 to 1571 MB table size, plus 676 MB index size) But I have read clustered indexes which orders data in physical order in disk. I will not insert new rows in middle of table so is it possible to use clustered indexes to get rid of index overhead? – maydin Apr 24 '19 at 19:41
  • Postgres does not have "clustered indexes", but there are BRIN indexes. I added above. – Erwin Brandstetter Apr 24 '19 at 19:50