6

I have a database table that maintains some information and is required to preserve order. Essentially if I have elements 1 through 5 listed and I want to add a new element, then it could be inserted anywhere in the existing row, either at the last, after 5, the beginning before 1 or somewhere in the middle such as after 3. Is there a way to do this using MySQL INSERT statements and specifying after which row we should insert the index?

I presume not. So my strategy to go about doing this is to create another column 'order_number' that basically records the order of the elements. For instance, if the record table has primary key (record_id) and the order_number listed side by side, it would look like this:

 record_id     order_number

    1              1    
    2              2    
    3              3
    4              4
    5              5

TO add a new element to this row after row 3, the resulting end table will look like this:

 record_id     order_number

    1             1
    2             2
    3             3
  **6**         **4**         <------ added row 
    4           **5**         <-- changed order_number
    5           **6**         <-- changed order_number

In such a situation, I can clearly achieve the order that I want by simply selecting the data that i want and providing an Order By order_number asc clause.

However, as you can see, to do a simple Insert, it requires me to update every other row's order_number that appears after it. The table is expected to have an extensive amount of rows to it (magnitude of 100,000) at minimum and simply updating every other row (hence locking the table) at every single insert operation is not at all feasible.

What is a better recommended strategy in this case ?

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
Parijat Kalia
  • 4,929
  • 10
  • 50
  • 77
  • sorry i do not get why the order of order_num changes from 1,2,3,4,5 to 1,2,3,5,6,4 by inserting 1 row. I would have expected 1,2,3,6,4,5. – sailingthoms Dec 02 '12 at 23:17
  • Well when I add a new element, there is 1 more element instead of 6 elements alone. The new element that I have inserted is supposed to be the 4th element in order. If insertion after a row was allowed, then it would essentially be after record_id # 3. But since this is not allowed, it inserts in as record_id #6, but it's order_number should be 4 so that when the elements are selected and presented by asc order, it appears after the record with order_number #3. To ensure this, it is given a record_number called record_number 4. – Parijat Kalia Dec 02 '12 at 23:34
  • This means that the original record with order_number #4 is now 5th in order, and 5th which originally was order_number #5 is now 6th in order hence the order reshuffles to 1,2,3,5(previousl 4),6(previously 5),4(the new row order) – Parijat Kalia Dec 02 '12 at 23:36
  • 3
    @sailingthoms Does my edit make it clear? – ypercubeᵀᴹ Dec 02 '12 at 23:37
  • the edit of the post is quite helpful – sailingthoms Dec 02 '12 at 23:38

2 Answers2

17

If the order_number is not to be shown but only used for ordering, I suggest you use a decimal datatype instead of integer. This way, when you have to insert a row "between" two existing rows, you can set as order_number, the average of the two existing order numbers.

In your example:

 record_id     order_number

    1             1.0
    2             2.0
    3             3.0
  **6**           3.5          <---- added row 
    4             4.0           <-- no change
    5             5.0           <-- no change

There is a problem though that if you keep inserting numbers in the same area, some order numbers may result to be too close for the precision of the datatype you have chosen, close enough as to not be distinguished form one another.

To avoid this, your insert procedure will have to examine whether the two existing order number are too close. In that case, it could reassign some order numbers of other nearby rows, "stretching" the order numbers above and below to "make space" for a new value.

You could also have a "cleanup" procedure that runs periodically and does this "stretching" in the whole or large parts of the table.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
5

I found this answer for a similar question: https://stackoverflow.com/a/6333717/1010050

In summary, it increments all of the record IDs below the one you will be adding, to maintain consistency. That still requires you to update all of the record IDs, so it isn't the most efficient. It does have the benefit, compared to your method, of maintaining the physical order in the database, rather than just a virtual order like you have.

Another way I can think of would be to record the child and parent record IDs for each record, rather than an order number, similar to a Doubly Linked List. Inserting an element in the middle would then only require updating two other records regardless of table size. This has the same disadvantage as your solution where the physical ordering would be wrong, so reading from the table in an ordered fashion would be more costly.

For example:

record_id        parent_id      child_id
   0                 NULL          1
   1                 0             2
   2                 1             NULL

When we insert a record after record_id = 1, the table becomes:

record_id        parent_id      child_id
   0                 NULL          1
   1                 0             3
   2                 3             NULL
   3                 1             2

Note how only the parent_id and child_id for IDs 1 and 2 had to change.

I think between these two solutions, the biggest thing to consider is what is your most common operation: reading out the values in order, or writing a new value in the middle somewhere. If it's reading, then updating the record IDs would be your best option in order to maintain the physical order of the database. If writing, then you can optimize for that by using the method I suggested similar to a doubly linked list, or your own order method.

Summary after question update: Seeing that updating most of the records is not feasible, then the other answer I found is definitely not valid. The solution of treating it similar to a doubly linked list is still plausible, however.

Community
  • 1
  • 1
Colin Williams
  • 173
  • 1
  • 10
  • This is nice and you can set foreign keys to enforce integrity. If you want to have a query though that uses this linked list for ordering, it will have to be a recursive query. MySQL has not implemented recursive queries so it would have to be some cursor or other procedural code. For lists with 100K elements, it would not very fast. – ypercubeᵀᴹ Dec 03 '12 at 00:02
  • Still, if the problem involves short lists, this is probably the best way. – ypercubeᵀᴹ Dec 03 '12 at 00:04
  • Yep, I ran through that one, I definitely do not want to update all other records simply coz of one insert, that is a performance nightmare given scalability possibilities. Creating a LinkedList still means changing all the other records that are affected by it. The records retrieved are infact rendered as a circular LinkedList in my table. Will be migrating this to MongoDB within a couple of weeks but I needed a makeshift solution until then :) – Parijat Kalia Dec 03 '12 at 00:36
  • 1
    @Parijat: You did not understand what Colin proposes. Every insert will affect 3 rows (maximum), not all. – ypercubeᵀᴹ Dec 03 '12 at 00:46
  • @ypercube Could you not order by parent_id when you query and get the records back in the proper order? This would effectively be like following the pointers through the linked list. Having an index on parent_id would make a query quite fast. – Colin Williams Dec 03 '12 at 04:27
  • Looking into my suggestion further (before it was just a quick thought off the top of my head), that wouldn't work for sorting. There's cases where it would fall down. Alas, @ypercube you are correct in the need of recursive queries to make this remotely efficient on a large dataset such as 100K elements. – Colin Williams Dec 03 '12 at 07:10
  • Good call. It would only work if you were only inserting at the end of the list, which defeats the whole purpose. – Colin Williams Mar 18 '14 at 21:01