0

I have a system producing about 5TB of time-tagged numeric data every year. The fields tend to be different for each row, and to avoid having heaps of NULLs I'm thinking of using Postgres as a document store with JSONB.

However, GIN indexes on JSONB fields don't seem to be made for numerical and datetime data. There are no inequality or range operators for numbers and dates.

Here they suggest making special constructs with LATERAL to treat JSON values as normal numeric columns, and here someone proposes using a "sortable" string format for dates and filter string ranges.

These solutions sound a bit hacky and I wonder about their performance. Perhaps this is not a good application for JSONB?

An alternative I can think of using a relational DB is to use the 6th normal form, making one table for each (optional) field, of which however there would be hundreds. It sounds like a big JOIN mess, and new tables would have to be created on the fly any time a new field pops up. But maybe it's still better than a super-slow JSONB implementation.

Any guidance would be much appreciated.


More about the data

The data are mostly sensor readings, physical quantities and boolean flags. Which subset of these is present in each row is unpredictable. The index is an integer, and the only field that always exists is the corresponding date.

There would probably be one write for each value and almost no updates. Reads can be frequent and sliced based on any of the fields (some are more likely to be in a WHERE statement than others).

Marco Giancotti
  • 323
  • 3
  • 9
  • 1
    Could you give an example of the sort of data you're storing and what you tend to do with it? Or perhaps consider a NoSQL database instead? – Schwern Nov 22 '16 at 04:06
  • @Schwern I've added more details about the data at the end. – Marco Giancotti Nov 22 '16 at 04:53
  • So your concern is that you'll eat up too much disk space with all the nulls? – Schwern Nov 22 '16 at 06:10
  • This is probably better asked on [dba.se](https://dba.stackexchange.com). – Schwern Nov 22 '16 at 06:15
  • Did you consider an array of `decimal`? I would assume that even compressed JSONB will have a higher storage footprint then a native array of numbers - although I am not sure how efficient an array with a lot of null elements is. If that is essentially a key/value storage, did you consider a `hstore` column? –  Nov 22 '16 at 07:13
  • 1
    @Schwern , a_horse_with_no_name, thank you. I've [re-framed the question](https://dba.stackexchange.com/questions/157538/numerical-data-with-lots-of-nulls-6nf-or-document-database) more generally on sba.se. – Marco Giancotti Dec 08 '16 at 02:14

0 Answers0