5

I need to generate a unique sequence number from multiple threads. I created the simple class below and it seems to work, but I'm not certain if I can rely on the sequence number being unique.

In addition, I need to be able to have the number go back to 0 if it exceeds 999999. I don't expect it to roll over for a very long time as the method will likely be called less than 100 times per day. I know that the system will be shut down periodically for maintenance before it has a chance to reach 999999.

The GetSequenceNumber method will be called from an xslt transformation in a BizTalk map and the method could be called more than once at the same time (within the same BizTalk host instance).

On my dev system, it seems to work correctly and generates different values even if BizTalk is calling the method more than once at the same time. The dev system only has a single BizTalk host instance running.

On the production system, however, there are two servers. Am I right in thinking that this method cannot guarantee uniqueness across servers since they are running in different App Domains?

I can't use a guid because the sequence number is limited to 16 characters.

public class HelperMethods
{
    private static int sequenceNumber = 0;

    public string GetSequenceNumber()
    {
        string result = null;

        int seqNo = Interlocked.Increment(ref sequenceNumber);

        result = string.Format("{0:MMddHHmmss}{1:000000}", DateTime.Now, seqNo);

        return result;
    }
}

I thought that I might be able to use the servers computer name and prepend some arbitrary character so that even if the sequence number generated on one server was the same as the other, it would still be different, but I'm not sure how unique it would be. Something like this:

    string seqNumber = (MachineName == "Blah" ? "A" : "B") + GetSequenceNumber();

Does anyone have any suggestions as to how I can create a unique sequence number? It doesn't have to be perfectly unique, I just need collisions to be very unlikely. Also, how can I reset the number back to 0 if it reaches 1000000 in a thread safe way?

Chris Dunaway
  • 10,974
  • 4
  • 36
  • 48
  • Does it need to reset at `1000000`, or is that an arbitrary high number you picked. For example, would it rolling over at `int.MaxValue` also be acceptable? What about negative numbers? – Rob Mar 27 '17 at 22:33
  • 2
    If you are sure you will never reach `int.MaxValue`, increment the counter like you do now, but return `seqNo % 1000000`. – Jakub Lortz Mar 27 '17 at 22:40
  • Would this work for you? `Convert.ToBase64String(Guid.NewGuid().ToByteArray()).Substring(0, 16)` – Enigmativity Mar 27 '17 at 22:43
  • 2
    @Enigmativity As far as I remember, simply truncating a GUID effectively destroys any 'guarantee' of non-collisions you'd gain by using a GUID. See [here](http://stackoverflow.com/questions/5678177/how-to-generate-8-bytes-unique-id-from-guid) – Rob Mar 27 '17 at 22:48
  • @Rob - Yes, quite probably. I was looking to do some folding of the byte array with some xor'ing. That should then probably be sufficient. – Enigmativity Mar 27 '17 at 23:55
  • @Rob - I've added an answer with the folding of the values of a `Guid`. – Enigmativity Mar 28 '17 at 00:07
  • **HOLD ON!** Why are you trying to do this? Can you describe the business or technical scenario that's driving this? Outside one circumstance, this is a very, very unusual requirement. – Johns-305 Mar 28 '17 at 14:45
  • @Rob - I picked 999999 as my max because part of the number is already being filled with a datetime stamp and I just needed 16 digits. – Chris Dunaway Mar 28 '17 at 18:25
  • @Johns-305 - The use case is as follows: BizTalk receives a file with multiple messages in it. It splits the messages apart and processes each of them in parallel. Each message is mapped using an xslt transform and the sequence number is generated as part of this mapping. Because the messages are processed in parallel, we have had collisions with the number. The current map just uses a timestamp down to several fractions of a second, but we still had collisions. – Chris Dunaway Mar 28 '17 at 18:28
  • @ChrisDunaway Well, I was more asking about why you need a sequence number. Outside of EDI, that's quite unusual. Is this just a unique ID? – Johns-305 Mar 28 '17 at 19:05
  • 2
    Usually when you want a unique number per message you want the Sending system to set that. Otherwise with re-tries there is the potential of the same message being given two or more numbers. – Dijkgraaf Mar 28 '17 at 19:54

3 Answers3

3

This should do a fairly good job of returning a unique string of 16 characters. It's based on a Guid being unique. Since a Guid converts to a 24 character string when using Convert.ToBase64String it folds the bytes over itself using an XOR to ensure that the uniqueness is spread throughout the 16 characters needed.

Guid gd = Guid.NewGuid();

byte[] ba = gd.ToByteArray();

ba = ba.Zip(ba.Reverse(), (b0, b1) => (byte)(b0 ^ b1)).ToArray();

string mostLikelyUnique16 = Convert.ToBase64String(ba).Substring(0, 16);

I get results like 8QBIi7JpCeHhCWmy.

There's no guarantee of uniqueness, but I would think it'd be fairly good, especially given your requirements allowing for occasional collisions.

I did a simple test and produced 1 million values without any collision.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • This is cool, but why not just use a GUID if you're going to not use a sequenced set of integers? – Dan Field Mar 30 '17 at 03:38
  • @DanField - Because the OP had a 16 character limit for the unique identifier and Guids become 24 characters long when uuencoded. – Enigmativity Mar 30 '17 at 05:13
  • Yes, if he really needs a unique 16 character string this works, but I was under the impression he needed a number that would loop back around (a true sequence). I'm also under the impression the character length of the number is less important than that it have numeric value - but again, if a 16 character string works this is a great solution. – Dan Field Mar 30 '17 at 13:16
2

If number of servers is known in advance and not very large then you could assign most significant digits to a "server ID", that is set in configuration for each server. So for instance in set of 10 servers one of them could only work with numbers in range 3000000-3999999. For another instance in a set of 100 servers it would be a range 4200000-4299999.

Main thing is to have in configuration the information that tells servers apart.

Dialecticus
  • 16,400
  • 7
  • 43
  • 103
  • This isn't a bad idea, except that BizTalk will run his maps in separate AppDomains that he has no control over, which the server may tear down without notice, destroying any static data he's using to track where his sequence is. – Dan Field Mar 30 '17 at 02:29
1

You'll have several problems here.

BizTalk Pipelines and Orchestrations run in separate AppDomains, and the Orchestration AppDomain gets torn down sometimes between hydrations of your orchestration. Even if, for right now, this map only ever runs in an orchestration xor a pipeline, at some point someone may put it on another one (or on another host instance), and your solution goes boom. And forget about it when you start running anything in a multi-server environment.

The bottom line on BizTalk maps is that they shouldn't typically be indeterminate in any way that uses external resources. If you just want to generate a GUID (which is indeterminate), that's ok. If you want to generate a sequence across multiple mappings, that's not a good idea. You'll have to rely on some external system, (it really should be external even to the BizTalk appdomain you're running in to really work properly - such as a singleton WCF service or using a SQL Server SEQUENCE) and that's generally a bad idea in a map (especially a map that's going to execute in parallel on multiple systems).

So what can you do?

  1. Pick a different way of identifying your debatched messages. Perhaps a Batch ID and an individual ID - and perhaps both GUID (or at least something derived from a GUID that's still good-enough for uniqueness).
  2. Sequence your messages before batching or on the way out, through an Ordered Delivery port or a singleton orchestration (effectively a singleton queue). This has some obvious disadvantages, but you can get a little creative with it - map your entire batch first to sequence it, then debatch; or rework your debatch process to construct a sequence in the message as it debatches; or throw everything into a SQL (2012+) table with a SEQUENCE that loops around per your requirement and pull it back out with your sequence.
  3. Rethink your requirement. What's the end system requesting this? Why are you thinking you need to do it in a map?
Dan Field
  • 20,885
  • 5
  • 55
  • 71
  • How about a web service that returns a range of sequences? – Dialecticus Mar 30 '17 at 09:03
  • That only solves part of one of the problems - you still don't know which AppDomain an individual map will be executing on, and if you're going to do that you'd have to call it before executing the individual maps anyway, which leads to the idea of just doing the sequencing before debatching or after a rebatching stage – Dan Field Mar 30 '17 at 13:14