8

I have the following RegEx to parse ISIN of bonds, assets, etc.. (2 characters followed by 10 digits and characters)

([A-Z]{2})([A-Z0-9]{10})

But this also marks for example a word like this ABCDEFGHIJKL, but this is no real ISIN. A definition of ISINs is here: WIKI

So some examples are US45256BAD38, US64118Q1076, XS0884410019. What would be the correct RegEx to search for them, without matches like ABCDEFGHIJKL?

Maybe with a RegEx to have at least one number?

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
ZerOne
  • 1,296
  • 6
  • 20
  • 40

3 Answers3

18

If you can not use lookahead according to the Wikipedia defintion you can also just check if the last character is a number as it should be the check digit.

ISINs consist of two alphabetic characters, which are the ISO 3166-1 alpha-2 code for the issuing country, nine alpha-numeric characters (the National Securities Identifying Number, or NSIN, which identifies the security, padded as necessary with leading zeros), and one numerical check digit.

Source: https://en.wikipedia.org/wiki/International_Securities_Identification_Number#Description

Meaning this also could work:

([A-Z]{2})([A-Z0-9]{9})([0-9]{1})
sipho102
  • 191
  • 1
  • 4
0

You can use a lookahead regex:

\b([A-Z]{2})((?![A-Z]{10}\b)[A-Z0-9]{10})\b

RegEx Demo

(?![A-Z]{10}\b) is a negative lookahead that will fail the match if all 10 chars are letters after first 2 chars.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks! This works fine in the demo, but I use it in PLSQL with REGEXP_SUBSTR and there it fails.. do you know if REGEXP_SUBSTR does not support lookaheads? – ZerOne Oct 16 '15 at 08:15
  • Hmm you didn't mention this in question. I'm not sure if Oracle regex supports lookahead. Check documentation. – anubhava Oct 16 '15 at 08:36
  • 1
    Yes I thought that this would be no problem for Oracle, but in fact Oracle doesn't support lookahead and lookbehind. You answered my question so thanks! I just asked the wrong question – ZerOne Oct 16 '15 at 10:03
  • The regex does what it says, but is incorrect for determining valid ISIN values. – enharmonic Aug 13 '21 at 18:50
  • @enharmonic: This answer just addresses the problem that OP is facing, nothing more was attempted. – anubhava Aug 13 '21 at 19:07
0

Do not forget the ^$ signs to avoid accepting longer strings like 'something_AS1234567890_anything_else'

^([A-Z]{2})([A-Z0-9]{9})([0-9]{1})$

Emanuele Pepe
  • 111
  • 1
  • 3