30

I need to be able to store a large list of ordered items in the DB. So far that's straight-forward:

ID Position OtherFields
 1     45      ...
 2   4736      ...
 3    514      ...
 ...

In queries, I always need to get just a few items (filtered based on OtherFields) but in the correct order. Easy as well, putting an index on Position and using "order by Position".

Now the problem: Items change their Position frequently, and not just by 1 or 2. If ID 2 changes the Position from 4736 to 2000, I need to update its Position and the Position of all elements between old Position 2000 and 4735, adding 1 in each row. And it's not only one ID that changes per transaction but a few, and there can be many transactions within a short time.

I think the most elegant way to handle the update problem would be using a linked list instead of a Position column where I can just remove ID 2 from its old position by linking its predecessor to its successor and then insert it elsewhere by linking it between its new predecessor and successor. That would be a constant and small number of updates per Position change and it would also be my preferred way of handling the changes (in Java in my case). However this raises the N+1 problem for querying in the correct order - even for a few elements, I have to go through the whole list in the worst case for finding out their correct order.

So my question is: What would you recommend to get a good balance between necessary updates and query performance?

So far I see two promising directions:

  1. Is there a DBMS (ideally OpenSource) that can handle linked lists not only with syntactic sugar but also with a good performance, e.g. by using internal Indices for the linked elements?

  2. Maybe it would also be an option to just have a BLOB where the whole Linked List would be stored in! How big could such a Linked List get / how much memory would it use in the DB and when fetched for let's say 1.000.000 entries? I'm using Java + Hibernate in case it matters. I imagine that processing even the whole list in memory after fetching the BLOB should be pretty fast!?

But of course other ideas are welcome as well!

Jörg Brenninkmeyer
  • 3,304
  • 2
  • 35
  • 50
  • Bill Karwin has a great answer on this you may want to check out: http://stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/ – dcp Aug 03 '10 at 18:00
  • I don't think that Bill's "tree" solutions fit. Though the data is hierarchical, it's a single, large tree path, which would have to be updated after each insert, which is equivalent to reindexing. – Marcus Adams Aug 03 '10 at 18:34
  • 1
    I've read that question as well and first it seemed promising to me, but then I realized that e.g. for the entry on position 1000 of 2000 in total, you would have to create a 1000 descendant entries in the Closure Table. In total, I think that would make about 2.000.000 entries for 2000 positions with lots of necessary updates - too much work and performance on the writing side for a simple list, so I agree to Marcus. – Jörg Brenninkmeyer Aug 04 '10 at 07:53
  • 1
    Similar on the Software Engineering site: [Storing a re-orderable list in a database](https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database) – Flux Apr 05 '19 at 23:18

4 Answers4

32

If you relax the constraint that the Position column must contain integers from 1 to N and instead allow it to contain any numbers then you can do both searches and updates efficiently.

You can insert an item between two other items with position A and B by calculating the average (A + B) DIV 2. For example if A is 10000 and B is 12000 then your new position is 11000. Occasionally you will run out of gaps due to clustering, at which point you can run through the whole table redistributing the positions more evenly.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 2
    I'm just throwing this out there. Can Position be a float column? You can always divide a number in half. :) – Marcus Adams Aug 03 '10 at 18:12
  • 10
    @Marcus Adams: It could, but you still would eventually hit the same issue of not having a gap due to precision and rounding errors. – Mark Byers Aug 03 '10 at 18:15
  • 2
    It was fun to think about anyway. – Marcus Adams Aug 03 '10 at 18:22
  • That makes totally sense, I think I'll try that with a BIGINT. Just read that unsigned it goes from 0 to 18.446.744.073.709.551.615. Even starting with a list of 1.000.000 equally distributed entries, this would leave a "space" of approx... 18.446.744.073.709 entries inbetween, giving some time before the first collision. In the worst case (all updates always go into the smallest gap which is always divided in 2), that should be 44 updates before a necessary redistribution, I think (2^44 = 17.592.186.044.416 < 18.446.744.073.709). – Jörg Brenninkmeyer Aug 04 '10 at 08:16
  • 1
    The average is a clever move ! Unfortunate is the case when you have redo the distribution... – Vojtiik Aug 19 '13 at 11:57
  • As @Flux already pointed out, I would follow the advice of the answer posted here, https://softwareengineering.stackexchange.com/a/195365/359144. You shouldn't use a hack like this unless it's absolutely necessary for performance. – ATOMP Mar 05 '20 at 20:47
  • If there is any case one would still use this approach, the best way is to use float8 in postgres. In the absolute worst case the ordering will be faulty after 2000+ operations. A little js script to demonstrate: x = 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0; for (let i = 0; i < 2100; i++) {x = x/2; console.log(i, x)} – Kevin Kreps Oct 25 '20 at 17:20
5

What about using a decimal for position? If you do, you could use the following method to put it between other positions:

Original records are:

ID    Position  Otherfields
--------------------------
1     1.0
2     2.0
.
.
.
5000  5000.0

Then say you move ID 1 to just before 5000

ID    Position  Otherfields
--------------------------
1     4999.9
2     2.0
.
.
.
5000  5000.0

Now lets say you want to put ID 2 between 1 and 5000:

ID    Position  Otherfields
--------------------------
1     4999.9
2     4999.91
.
.
.
5000  5000.0

This way you are only changing one record...

UPDATE:

After re-reading @Mark Byers' suggestion it appears that our solutions are very similar, though using a decimal seems much simpler to me...

Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
1

Another solution would be to use lexical ranking. Whenever there is no value between two strings, one would simply add a new character. The only disadvantage is, that it consumes more memory compared to a numerical solution, which are named in other answers.

A normalization isn't necessary for an error free database, but can be optionally done to reduce the string lenght.

Here is an interesting video, which explains it in detail: https://www.youtube.com/watch?v=OjQv9xMoFbg&feature=youtu.be

Kevin Kreps
  • 560
  • 6
  • 18
-1

The following might be helpful. It does not answer your questions directly, but could shed some light on how to do this (if it is possible for your requirements):

ranking entries in mysql table

Community
  • 1
  • 1
del.ave
  • 1,888
  • 5
  • 17
  • 23