1

This is the situation I'm in:

I have a (main) table which has an id and several more columns.
Then there are 3 more tables that only contain an id (JOINed to the id of the main table) and a timestamp.

The ids on all tables are defined as unique so there's an implicit index by id which speeds up the JOIN. The problem is, if I retrieve the timestamp after joining, it is not in the index so it forces Postgres into a bitmap index scan and a bitmap heap scan (or a seq scan, depending on the # of rows), only for that value.

The question is, should I replace the implicit index for one that contains both the id and timestamp? That would be an index which has the same data as the table, sounds like a waste of space in that regard.

Also, I think I have to keep the original index if I want the UNIQUE check to remain in place.

UPDATE: Forgot to add, these tables have a very high constant write rate and sporadic reads, but reads access a lot of rows at once. That is why I'd rather use all inserts instead of updates in place (all in one table). The other thing is, rows from these tables might come at any order, I cannot ensure the main table row will be inserted before the small ones arrive.

Ariel Flesler
  • 612
  • 1
  • 4
  • 13
  • You must indeed keep the original index to retain the constraint; Pg doesn't have btree indexes that're UNIQUE only on some columns. As for the rest - it's just a trade-off, really. – Craig Ringer Dec 14 '14 at 16:22

1 Answers1

2

You have tables with a UNIQUE constraint on id and only one other column. And the id is joined to main_table.id, also unique, so effectively 1:1 relationships.

And you worry about wasting space with an index?

I suggest you completely replace those tables with single, nullable columns in the main table each. This will save a lot more space and guarantee one value per id in the main table automatically.

Don't worry too much about NULL storage, that's extremely cheap in comparison to additional tables.

If the main table is big and secondary tables only have very few rows (ratio < 1:1000!), Or if you have many updates on the secondary tables (which are cheaper to update than big unified rows) your design starts to make sense (from a storage / performance perspective). In this case, yes, you'll need the (implicit) index of the UNIQUE or PK constraint on (id) in addition to a multicolumn index on (id, timestamp) to allow index-only scans.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Indeed I transform these tables into a single one with NULLs using a view. The reason I don't do that as I write is because these tables have a very high rate of writes and a non-main row might arrive before the main one. For those reasons I opted for insert-only to avoid updates and the whole insert-or-update logic that MySQL has but Postgres lacks – Ariel Flesler Dec 15 '14 at 17:15
  • @ArielFlesler: I see, those are good reasons. There are solutions for UPSERT (MERGE) [here](http://stackoverflow.com/questions/17152860/upsert-with-a-transaction) or [here](http://stackoverflow.com/questions/17267417/how-do-i-do-an-upsert-merge-insert-on-duplicate-update-in-postgresql/17267423#17267423), but updates are cheaper in the small secondary tables either way. So my last chapter is the answer: you need *both* indexes. – Erwin Brandstetter Dec 16 '14 at 08:09