505

What's the Hi/Lo algorithm?

I've found this in the NHibernate documentation (it's one method to generate unique keys, section 5.1.4.2), but I haven't found a good explanation of how it works.

I know that Nhibernate handles it, and I don't need to know the inside, but I'm just curious.

Amit Joshi
  • 15,448
  • 21
  • 77
  • 141
DiegoCofre
  • 5,053
  • 3
  • 17
  • 6

5 Answers5

589

The basic idea is that you have two numbers to make up a primary key- a "high" number and a "low" number. A client can basically increment the "high" sequence, knowing that it can then safely generate keys from the entire range of the previous "high" value with the variety of "low" values.

For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available.

It can be very useful (particularly with ORMs) to be able to set the primary keys on the client side, instead of inserting values without primary keys and then fetching them back onto the client. Aside from anything else, it means you can easily make parent/child relationships and have the keys all in place before you do any inserts, which makes batching them simpler.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 14
    Are you saying that "low ranges" are coordinated within the client, while the "high sequence" corresponds to a DB sequence? – Chris Noe Jun 30 '09 at 13:15
  • 14
    Are the hi & lo values typically then composed into a single integer value, or as a two-part business key? – Chris Noe Jun 30 '09 at 15:48
  • 54
    like an IP address then - ICANN gives you a high 'network' number, you then have as many low 'host' numbers as you like, within the limit of the CIDR range you're given. – gbjbaanb Aug 07 '09 at 14:19
  • 1
    What is the advantage of the hi/lo algorithm over simply having clients draw *batches* of keys from the central sequence in bulk? – Adam Paynter Aug 09 '10 at 10:57
  • 6
    @Adam: Fundamentally, nothing - it's just potentially cheaper to increment one value (the "high" part) than to generate a bunch of keys. (It's potentially *much* cheaper in terms of data transfer - you can "reserve" a huge number of keys with minimal bandwidth.) – Jon Skeet Aug 09 '10 at 11:15
  • @Jon: True. Though in practice, clients need not *transfer* the batch of keys. The central sequence could merely be incremented by a larger quantity, effectively reserving those keys for the client. – Adam Paynter Aug 09 '10 at 11:31
  • 4
    @Adam: That's true if the keys are just numbers. Not so much for GUIDs :) But yes, in the case of simple numbers, any atomic "increment by a fixed amount" will do. That's effectively what hi-lo is doing, if you think of it as one number split into two sections. – Jon Skeet Aug 09 '10 at 11:36
  • @Jon: Yes, that's effective what hi-lo is doing. You're point about non-numeric keys finally made things click for me. How naive of me, assuming keys are numbers. :) – Adam Paynter Aug 09 '10 at 11:52
  • So it's just a particular way to assign the client a range of primary keys instead of a single primary key, which is desirable for performance reasons? – Craig Gidney Oct 12 '10 at 17:27
  • @Strilanc: That's my understanding, yes. Of course, there could well be subtleties that I'm missing :) – Jon Skeet Oct 13 '10 at 06:12
  • @Jon In the example sequence- 35/0, 35/1, 35/2, 35/3... 35/1023, Shouldn't be it 36 instead of 35? – jai Jun 17 '11 at 01:23
  • 2
    @HanuAthena When Hi is equals to 35 it means that 35 is first unoccupied sequence. All sequences below 35 are already used by other clients. By incrementing Hi value the client reserves 35 to self and notifies other clients that 36 become first unoccupied sequence. – Artemix Oct 11 '11 at 15:27
  • How is the key stored in the db? And I assume the hi number seed for the next user would be stored in some reference table and incremented when requested? – Cory House May 09 '12 at 12:22
  • @CoryHouse: It would be stored in the DB as the complete value, either as one number (the high bits and low bits squashed into one larger field) or potentially as two fields. (There's not a lot of point in doing the latter.) The seed would depend on the database - it could be an Oracle sequence, for example, or just a row in a separate table. – Jon Skeet May 09 '12 at 12:24
  • Depending on data insertion scenarios, it seems it may lead to gaps in id-s and lots of wasted id-s? Is there a version of hi-lo algorithm tacking it? – Yurii Hohan Sep 22 '13 at 09:03
  • 1
    @Hohhi: If you have a large range of IDs (e.g. 64 or 128-bit) gaps really don't matter. If you're limited to a lower range, you allocate fewer per "high" part - e.g. only give them out in chunks of 256. You'll still "waste" some IDs, but in most cases it's *really* not a problem. – Jon Skeet Sep 22 '13 at 12:01
  • @JonSkeet If I didn't misunderstand your mean, the Id'value is generated by hilo in the client without being in the DB server? Is there any good reason to do that ? thanks. – Joe.wang Mar 04 '14 at 11:03
  • 1
    @Joe.wang: You talk to the database to reserve a "chunk" of IDs: it gives you a "hi" value which it won't return to anyone else, and then you populate the "lo" values locally. It means that you just need 1 database call for in order to get a large chunk of IDs, instead of one per entity. – Jon Skeet Mar 04 '14 at 11:41
  • @JonSkeet in the example you mentioned what are the primary keys to be written to database table? are they 35-0, 35-1, or smth? Also how can I define high and low numbers, what are the relationship of these numbers to hibernate_sequence table? – Nurlan Oct 14 '15 at 14:09
  • @Nurlan: You'd either have some way of composing the individual parts into a single number (e.g. by multiplying the high part by 100,000 or something and then adding the low part) or you'd have a composite primary key. I'm not sure what you mean by "define high and low numbers" - and it's been too long since I've worked with Hibernate to give you the details of that aspect. Sorry! – Jon Skeet Oct 14 '15 at 15:30
  • @JonSkeet Can this algorithm be used with other (Micro) ORMs such as Dapper? Is there any additional magic NHibernate does apart from the tables that we create, which will have to be incorporated in Dapper? – Ajay Jadhav Feb 19 '16 at 11:30
  • @AjayJadhav: I don't know anything about Dapper, I'm afraid. – Jon Skeet Feb 19 '16 at 12:38
173

In addition to Jon's answer:

It is used to be able to work disconnected. A client can then ask the server for a hi number and create objects increasing the lo number itself. It does not need to contact the server until the lo range is used up.

Stephan Eggermont
  • 15,847
  • 1
  • 38
  • 65
42

The hi/lo algorithm splits the sequences domain into hi groups. A hi value is assigned synchronously. Every hi group is given a maximum number of lo entries, that can be assigned off-line without worrying about concurrent duplicate entries.

  1. The hi token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values

  2. Once a hi token is retrieved we only need the incrementSize (the number of lo entries)

  3. The identifiers range is given by the following formula:

     [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    and the “lo” value will be in the range:

     [0, incrementSize)
    

    being applied from the start value of:

     [(hi -1) * incrementSize) + 1)
    
  4. When all lo values are used, a new hi value is fetched and the cycle continues

And this visual presentation is easy to follow as well:

enter image description here

While hi/lo optimizer is fine for optimizing identifier generation, it doesn't play well with other systems inserting rows into our database, without knowing anything about our identifier strategy.

Hibernate offers the pooled-lo optimizer, which offers the advantages of the hi/lo generator strategy while also providing interoperability with other 3rd-party clients that are not aware of this sequence allocation strategy.

Being both efficient and interoperable with other systems, the pooled-lo optimizer is a much better candidate than the legacy hi/lo identifier strategy.

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
  • I really don't understand you sometimes hahaha so: While hi/lo optimizer is fine for optimizing identifier generation (Ok good), it doesn't play well with other systems(what do you mean by other systems ?, which are the first ones ?) inserting rows into our database(Doesn't identifier generation used to insert rows too ?) , without knowing anything about our identifier strategy. – Adelin Feb 06 '17 at 21:53
  • 2
    Other systems, like a DBA trying to run an INSERT statement. If she reads the current sequence data, do you think it's easy to figure out the next identifier value knowing we use hilo in this particular DB table? – Vlad Mihalcea Feb 07 '17 at 01:46
  • My apologies if the comment isn't suitable for your answer, but I was wondering what optimizer is used by default? Or does it depend on DB (I'm using PostgreSQL)? Because I cannot figure out relation between current sequence value and generated IDs. I'm using `@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "name") @SequenceGenerator(name="name", sequenceName = "name_seq", allocationSize=100)` for my IDs. – Stefan Golubović Oct 29 '19 at 18:33
  • 1
    @VladMihalcea, I believe you have a typo in bullet three, first snippet at `, (hi * incrementSize) + 1)`... it should be `, hi * incrementSize)`, right? – Huiagan Apr 06 '20 at 17:29
27

Lo is a cached allocator that splits the keyspace into large chunks, typically based on some machine word size, rather than the meaningfully-sized ranges (eg obtaining 200 keys at a time) which a human might sensibly choose.

Hi-Lo usage tends to waste large numbers of keys on server restart, and generate large human-unfriendly key values.

Better than the Hi-Lo allocator, is the "Linear Chunk" allocator. This uses a similar table-based principle but allocates small, conveniently-sized chunks & generates nice human-friendly values.

create table KEY_ALLOC (
    SEQ varchar(32) not null,
    NEXT bigint not null,
    primary key (SEQ)
);

To allocate the next, say, 200 keys (which are then held as a range in the server & used as needed):

select NEXT from KEY_ALLOC where SEQ=?;
update KEY_ALLOC set NEXT=(old value+200) where SEQ=? and NEXT=(old value);

Providing you can commit this transaction (use retries to handle contention), you have allocated 200 keys & can dispense them as needed.

With a chunk-size of just 20, this scheme is 10x faster than allocating from an Oracle sequence, and is 100% portable amongst all databases. Allocation performance is equivalent to hi-lo.

Unlike Ambler's idea, it treats the keyspace as a contiguous linear numberline.

This avoids the impetus for composite keys (which were never really a good idea) and avoids wasting entire lo-words when the server restarts. It generates "friendly", human-scale key values.

Mr Ambler's idea, by comparison, allocates the high 16- or 32-bits, and generates large human-unfriendly key values as the hi-words increment.

Comparison of allocated keys:

Linear_Chunk       Hi_Lo
100                65536
101                65537
102                65538
.. server restart
120                131072
121                131073
122                131073
.. server restart
140                196608

Design-wise, his solution is fundamentally more complex on the number-line (composite keys, large hi_word products) than Linear_Chunk while achieving no comparative benefit.

The Hi-Lo design arose early in OO mapping and persistence. These days persistence frameworks such as Hibernate offer simpler and better allocators as their default.

Thomas W
  • 13,940
  • 4
  • 58
  • 76
  • 1
    Who is mister Ambler? – Apocatastasis Oct 19 '13 at 18:51
  • 1
    Scott Ambler promotes a so-called "hi-lo" allocation strategy using 16- or 32-bit words. Here's his page: http://www.ambysoft.com/scottAmbler.html – Thomas W Oct 19 '13 at 21:51
  • 2
    +1 for an interesting answer. I agree that the vast majority of applications gain no advantage from Hi-Lo over the simpler approach; however I think Hi-Lo is better suited to the special case of multiple allocators in highly concurrent applications. – richj May 10 '14 at 18:08
  • 2
    Thanks @richj! My point is that you can use multiple allocators or large block sizes with "linear block allocation", but that -- unlike Hi/Lo -- it maintains a _linear_ correspondence of allocator NEXT_VAL to keys in the table, and is tuneable. Unlike HiLo, no multiplication is needed -- it's just not necessary! The multiplier & storage of NEXT_HI makes HiLo more complex & breaks tuneability, since changing the blocksize will arbitrarily change the next key to be issued.. See: http://literatejava.com/hibernate/linear-block-allocator-a-superior-alternative-to-hilo/ – Thomas W May 11 '14 at 01:37
  • 3
    I'm interested in multiple independent allocators. With Hi-Lo it's obvious that the high value can be partitioned into allocator ID/block ID. It wasn't immediately obvious (to me) that the same approach can be applied to Linear Chunk, but it is basically the same problem of dividing the total range between allocators. I've got it now. Thanks. – richj May 11 '14 at 17:21
  • 1
    What is the purpose of the SEQ column of table KEY_ALLOC? How is it used? – Rock Anthony Johnson Sep 21 '18 at 19:58
  • 2
    Oh, after thinking about it, I think the SEQ column maps to a table name. For example, there is an allocator the Customers table, one for the Orders table, and so forth. Forgive me, I'm slow, sometimes. – Rock Anthony Johnson Sep 21 '18 at 20:28
2

I found the Hi/Lo algorithm is perfect for multiple databases with replication scenarios based in my experience. Imagine this. you have a server in New York (alias 01) and another server in Los Angeles (alias 02) then you have a PERSON table... so in New York when a person is create... you always use 01 as the HI value and the LO value is the next secuential. por example.

  • 010000010 Jason
  • 010000011 David
  • 010000012 Theo

in Los Angeles you always use the HI 02. for example:

  • 020000045 Rupert
  • 020000046 Oswald
  • 020000047 Mario

So, when you use the database replication (no matter what brand) all the primary keys and data combine easily and naturally without to worry about duplicate primary keys, collissions, etc.

This is the best way to go in this scenario.

Theo
  • 476
  • 6
  • 4
  • 1
    It doens't work in Hibernate. HiLo algrotirm gets a new value of sequence in each transaction, so HI-counter increments accordinally. But in your example, HI-counter is always constant for one DB. – Dmitry1405 Nov 08 '16 at 13:35