0

I have an application. Suppose it's an invoice service. Each time a user creates an invoice I need to assign the next sequential number (I.e: ISequentialNumberGeneratorRepository.Next(); So essentially the invoice number must be unique despite having several instances of my application running (horizontal scalability is likely in the future).

In other words, I need a global sequential number generator.

Traditionally this problem is resolved by using a relational database such as SQL server, PostgreSQL, MySQL, etc. because these systems have the capability to generate sequential unique IDs on inserting a record and returning the generated id as part of the same atomic operation, so they're a perfect fit for a centralised sequential number generator.

But I don't have a relational database and I don't need one, so it's a bit brutal having to use one just for this tiny functionality.

I have, however, an EventStore available (EventStore.org) but I couldn't find out whether it has sequential number generation capability.

So my question is: Is there any available product out there which I could use to generate unique sequential numbers so that I can implement my Next(); repository's method with, and which would work well independently of how many instances of my client invoice application I have?

Note: Alternatively, if someone can think of a way to use EventStore for this purpose or how did they achieve this in a DDD/CQRS/ES environment it'd also be great.

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
diegosasw
  • 13,734
  • 16
  • 95
  • 159
  • 4
    What should happen if something requests one of these numbers and then (for whatever reason) is unable to complete the rest of what it's trying to do and so the number ends up unused? (I.e. a "gap" is introduced in the sequence). If you cannot tolerate gaps then you have a *massive* synchronization convoy that's going to limit your scalability. – Damien_The_Unbeliever Nov 14 '18 at 15:22
  • 3
    I agree with Damien... the requirement for sequential numbers is typically a XY problem indication. It is much easier to apply sequential numbers to all existing data based on a non-sequential sorting condition than to correctly guess the correct sequence number for the future next entry in a parallel environment. – grek40 Nov 14 '18 at 18:06
  • For the sake of this question, it won't happen. If the next sequence number is provided, nothing will prevent the logic from failing therefore there won't be any gaps. That scenario is out of scope. – diegosasw Nov 14 '18 at 19:41
  • The scenario is the following. I'm using DDD with Event Sourcing. The next reference number available is requested in a domain service in order to generate an event with that reference number in it. The event will be produced. Even if it failed to be produced and he next sequential reference number is "wasted" it doesn't matter. The question remains, is there any product able to provide sequential numbers to multiple consumers without caring about what the consumers do with that reference number? – diegosasw Nov 14 '18 at 19:42
  • I don't know event-store, but usually we use some kind of autoincrement provided by the database. If the data doesn't need to be sequential, and you want to work without autoincremental, usualy we use unique identifiers such as an `GUID`. – Christian Gollhardt Nov 19 '18 at 12:47
  • Easiest way: use SQL SEQUENCEs. Or you can implement a simple server/service to provide one. – Mert Gülsoy Nov 19 '18 at 12:51
  • You can reimplement the logic database servers use yourself: make a persistent record that you've reserved a block of X numbers, then hand these out from memory using a suitably fast and atomically safe mechanism (like interlocked increments). The tricky bit is the persistent record, which is the slow part that needs a lock and a flushed write to disk. A complication is that this service should be centrally available, meaning you have some tough choices to make about what happens if it's unreachable. – Jeroen Mostert Nov 19 '18 at 13:11
  • That said, given that SQL Server Express is free, has a well-known client/server protocol that has no problem with scaling, is well supported by libraries for retry mechanisms and consumes minimal resources if it's used only for the task of handing out sequences, I don't think rolling your own really pays off in this case. Even if you wrote the client/server bit yourself because you like HTTP more than TDS, it would probably still be simpler to just use an underlying local server rather than reinvent transactions. – Jeroen Mostert Nov 19 '18 at 13:16
  • 1
    There are two basic ways to do this. First, your dbase engine already knows how to do this and never gets it wrong regardless of how many apps add records. Google "sql identity column" for relevant hits. Second is to provide a unique finger print for the record that can serve as the primary key and a Date column for sequencing. A guid is good for that. – Hans Passant Nov 19 '18 at 14:44
  • @HansPassant You had more options [nine years ago](https://social.msdn.microsoft.com/Forums/windows/en-US/fa5d3d33-7f01-4b7c-a5b6-51db87fa0509/generate-sequence-number-using-c?forum=winforms) ;) – GSerg Nov 19 '18 at 18:03
  • @GSerg That link only lists one solution for solving this problem; the first of the two mentioned in the comment here. – Servy Nov 19 '18 at 22:14
  • is this application a windows app, mobile app, web app, do they live on teh same network, ect ect – TheGeneral Nov 24 '18 at 09:05

3 Answers3

3

IMHO, your requirement is kinda flawed, because you have conflicting needs.

You want a unique id. The usual solutions use:

  • guid. Can be generated centrally or locally. Really easy to implement. Kinda hard for a human reader, but YMMV. But you want incremental keys.
  • centrally assigned key: you need a transactional system. But you want to do CQRS, and use Event Store. It seems to me that having a separate transactional system just to have an IDENTITY_COLUMN or a SEQUENCE largely misses the point of doing CQRS.
  • use an HiLo generation approach. That is: every single client gets a unique seed (like 1 billion for the first client, 2 billions for the second, etc). So each client can generate locally a sequence. This sequence is distributed and uses sequential numbers, so there is no concurrency problems, but there is no global sorting for requests and you must ensure that no two clients get the same Hi value (relatively easy task).
  • use the id assigned by Event Store. I don't know the product, but every event sent to the queue gets a unique id. But (as I understand it) you require the id to be available BEFORE sending the event.

You can generally mix-and-match either of this solutions (especially the Hilo algorithm) with timestamps (like seconds from Unix Epoch, or something alike), in order to produce a (weak, non guaranteed) sortability. But generally I would avoid this, because if you generate ids on multiple sites, you introduce the risk of the clocks being unsynchronized, and generally other unsolved (or unsolvable) problems.

Probably I'm missing something, but this are the ones from the top of my head.

So, as far as i can tell, you are in an empasse. I would try really hard to put myself in one of the previous situations.

Alberto Chiesa
  • 7,022
  • 2
  • 26
  • 53
  • I don't see what's the conflicting need? I need a centralised sequential number generator independently of the number of client applications I have. And it must be unique as in two requests made at the same time from the same or different instances of my client application should generate different but sequential numbers. And that's all. – diegosasw Nov 19 '18 at 21:47
  • I know usually relational databases are used for this purpose, but it just feels a bit too much to have a while DB relational engine for this purpose, so I was wondering whether there's another way maybe with an existing available product. – diegosasw Nov 19 '18 at 21:49
  • And about using CQRS, it doesn't really matter. CQRS is about separating commands from queries in the system. I'm only talking here about business logic during the command phase. No read involved, just business logic and this "centralised sequential number generator" would simply be a repository implementation in my infrastructure layer. – diegosasw Nov 19 '18 at 21:52
  • If CQRS is not relevant, you can omit it from your question. Because CQRS in its simplest form is _only_ about separation of commands and queries. But usually CQRS lend itself to a very specific implementation, using an event queue to ingest commands, processed in a strictly asynchronous way, while the views are updates later. You're using Event Source and I'm assuming it's not by chance, so I assumed you _need_ fast, asynchronous processing. This would rule out any centralized, transactional storage.If you don't, why are you using Event Store in the first plase? Just asking... – Alberto Chiesa Nov 19 '18 at 22:30
  • The Hilo generation approach seems interesting though. But without global sorting I'm afraid I wouldn't know how to make it work for my needs – diegosasw Nov 19 '18 at 22:31
  • If you care to really EXPLAIN your needs, it could be possible to effectively help you ;) Why the need for a sequence? You are describing what you _think_ you need, not what you need to solve. – Alberto Chiesa Nov 19 '18 at 22:32
  • I mentioned event sourcing to point out that I have an event store "database" that I could use as the central sequence number generator. – diegosasw Nov 19 '18 at 22:33
  • Imagine an invoice system. Every invoice needs a sequential number. But I could have many instances of this invoice system, and I don't want repeated invoice numbers being generated. So that's why the sequence number generator should be global – diegosasw Nov 19 '18 at 22:34
  • Exactly: it's a "database" with quotes. You need a transactional system to make transactions. Event Sourcing is a tool that excels in high throughput, low locking scenarios. The kind of requirement that goes in the opposite direction to what you're asking. – Alberto Chiesa Nov 19 '18 at 22:34
  • If it's for an invoice system, you can generate a request key using a Guid, and then generate an Invoice Number only when the invoice is created, in a transactional db. In this way you can use a simple key when performing the creation, and centrally generate a sequential key only "after the fact" – Alberto Chiesa Nov 19 '18 at 22:35
  • Mmm. You're right about my question being misleading , sorry. When I mentioned transaction I didn't mean database transaction. I meant user transaction as in a command sent to my domain. I'll edit it. – diegosasw Nov 19 '18 at 22:42
  • Please add timestamps to your answer. While not necessarily unique in a multi-threaded or multi-server environment, it may help someone in the future. –  Nov 21 '18 at 05:10
  • @Strom timestamps are not id. However, I added a note about using them to create combined ids with other methods enforcing uniqueness. – Alberto Chiesa Nov 21 '18 at 08:12
  • Timestamps can be a form of id, Not necessarily a database ID, but that was not the question. Microsoft uses them for Active Directory replication resolution. –  Dec 01 '18 at 02:16
3

You have not stated the reasons(or presented any code) as to why you want this capability. I will assume the term sequential should be taken as monotonically increasing(sorting not looping).

I tend to agree with A.Chiesa, I would add timestamps to the list, although not applicable here.

Since your post does not indicate how the data is to be consumed, I purpose two solutions, the second preferred over the first, if possible; and for all later visitors, use a database solution instead.

The only way to guarantee numerical order across a horizontally scaled application without aggregation, is to utilize a central server to assign the numbers(using REST or RPCs or custom network code; not to mention an SQL server, as a side note). Due to concurrency, the application must wait it's turn for the next number and including network usage and delay, this delay limits the scalability of the application, and provides a single point of failure. These risks can be minimized by creating multiple instances of the central server and multiple application pools(You will lose the global sorting ability).

As an alternative, I would recommend the HI/LO Assigning method, combined with batch aggregation. Each instance has a four? digit identifier prefixed to an incrementing number per instance. Schedule an aggregation task on a central(or more than one, for redundancy) server(s) to pickup the data and assign a sequential unique id during aggregation. This process localizes the data(until pickup, which could be scheduled for (100, 500, 1000)? millisecond intervals if needed for coherence; minutes or more ,if not), and provides almost perfect horizontal scaling, with the drawback of increased vertical scaling requirements at the aggregation server(s).

Distributed computing is a balancing act, between processing, memory, and communication overhead. Where your computing/memory/network capacity boundaries lie cannot be determined from your post.

There is no single correct answer. I have provided you with two possibilities, but without specific requirements of the task at hand, I can go no further.

0

It is strange opinion

so it's a bit brutal having to use one just for this tiny functionality.

Today SQLite is used as relational database even in mobile phones. It is simple, have small memory footprint and have binding for all popular programming languages. 20 years ago databases consumed many resources - today you can find database engine for all tasks. Also, if you need tiny key-pair store you can use BerkeleyDB.

Alexei Shcherbakov
  • 1,125
  • 13
  • 10
  • 1
    This adds a lot of overhead. Especially in a horizontal scaling situation where network overhead is an issue. –  Dec 01 '18 at 02:19