2

I'm trying to write a regex pattern to validate Unique Transaction Identifiers (UTI). See description: here

The UTI consists of two concatenated parts, the prefix and the transaction identifier. Here is a summary of the rules I'm trying to take into account:

  • The prefix is exactly 10 alphanumeric characters.
  • The transaction identifier is 1-32 characters long.
  • The transaction identifier is alphanumeric, however the following special characters are also allowed: . : _ -
  • The special characters can not appear at the beginning or end of the transaction identifier.
  • It is not allowed to have two special characters in a row.

I have so far constructed a pattern to validate the UTI for the first 4 of these points (matched with ignored casing):

^[A-Z0-9]{11}((\w|[:\.-]){0,30}[A-Z0-9])?$

However I'm struggling with the last point (no two special characters in a row). I readily admit to being a bit of a novice when it comes to regex and I was thinking there might be some more advanced technique that I'm not familiar with to accomplish this. Any regex gurus out there care to enlighten me?


Solved: Thanks to user Bohemian for helping me find the pattern I was looking for. My final solution looks like this:

^[a-zA-Z0-9]{11}((?!.*[.:_-]{2})[a-zA-Z0-9.:_-]{0,30}[a-zA-Z0-9])?$

I will leave the question open for follow-up answers in case anyone has any further suggestions for improvements.

Rune Aamodt
  • 2,551
  • 2
  • 23
  • 27
  • Your regex suggests that letters must be uppercase. Is that true? Lowercase letters are "alphanumeric" too. – Bohemian Oct 16 '17 at 15:38
  • @Bohemian: Yes, I'm actually running the matching engine with ignored casing enabled, I made a small remark about it. – Rune Aamodt Oct 16 '17 at 15:56

1 Answers1

2

Try this:

^[A-Z0-9]{11}(?!.*[.:_-]{2})[A-Z0-9.:_-]{0,30}[A-Z0-9]$

The secret sauce is the negative look ahead (?!.*[.:_-]{2}), which asserts (without consuming input) that the following text does not contain 2 consecutive "special" chars .:_-.


Note that your attempt, which uses \w, allows lowercase letters and underscores too, because \w is the same as [a-zA-Z0-9_]

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thanks, that was exactly what I was looking for! I made a minor adjustment, posting my final solution in an edit above. – Rune Aamodt Oct 16 '17 at 16:08