1

There is a simple regex pattern that detect IBAN in a text (including specific country formatting ?)

Actually I am finding IBAN with this regex :

[a-zA-Z]{2}[0-9]{2}[0-9a-zA-Z]{10,30}

But this doesn't match formatted IBAN like (normal, I don't have introduced whitespaces detection) :

FR76 30003 02420 002202XXXXX 77

or

PT50 0002 0123 1234 5678 9015 4

Can you help me ? Where can I find all formatted IBAN pattern by country ?

Example :

"My IBAN is PT50 0002 0123 1234 5678 9015 1 catch it with a regex and these one PT50000201231234567890151 too !"

I would like to extract/process "PT50 0002 0123 1234 5678 9015 1" and "PT50000201231234567890151".

Edit: Solution 1 - Very long pattern:

((NO)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{3}|(NO)[0-9A-Z]{13}|(BE)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}|(BE)[0-9A-Z]{14}|(DK|FO|FI|GL|NL)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{2}|(DK|FO|FI|GL|NL)[0-9A-Z]{16}|(MK|SI)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{3}|(MK|SI)[0-9A-Z]{17}|(BA|EE|KZ|LT|LU|AT)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}|(BA|EE|KZ|LT|LU|AT)[0-9A-Z]{18}|(HR|LI|LV|CH)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{1}|(HR|LI|LV|CH)[0-9A-Z]{19}|(BG|DE|IE|ME|RS|GB)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{2}|(BG|DE|IE|ME|RS|GB)[0-9A-Z]{20}|(GI|IL)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{3}|(GI|IL)[0-9A-Z]{21}|(AD|CZ|SA|RO|SK|ES|SE|TN)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}|(AD|CZ|SA|RO|SK|ES|SE|TN)[0-9A-Z]{22}|(PT)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{1}|(PT)[0-9A-Z]{23}|(IS|TR)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{2}|(IS|TR)[0-9A-Z]{24}|(FR|GR|IT|MC|SM)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{3}|(FR|GR|IT|MC|SM)[0-9A-Z]{25}|(AL|CY|HU|LB|PL)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}|(AL|CY|HU|LB|PL)[0-9A-Z]{26}|(MU)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{2}|(MU)[0-9A-Z]{28}|(MT)[0-9A-Z]{2}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{4}[ ][0-9A-Z]{3}|(MT)[0-9A-Z]{29})

And doesn't work for french specific pattern.

Aure77
  • 3,034
  • 7
  • 33
  • 53
  • 1
    https://github.com/arhs/iban.js – BenM Nov 15 '16 at 17:19
  • Why not strip all whitespace then apply the regex? – Mr. Llama Nov 15 '16 at 17:19
  • Like mentioned in title, I want to detect IBAN in a large text (to process it), not validate it. – Aure77 Nov 15 '16 at 17:21
  • Possible duplicate of [IBAN Validation check](http://stackoverflow.com/questions/21928083/iban-validation-check) – Heretic Monkey Nov 15 '16 at 17:24
  • 1
    When you search for "IBAN regex javascript" there are all sorts of solutions from simple regexes up to full IBAN validator libraries. Please pick an existing solution before rolling your own. – Tomalak Nov 15 '16 at 17:26
  • I found only regex for validating a IBAN string or detect unformatted IBAN. But my question is how to detect formatted AND unformatted IBAN in a large document. – Aure77 Nov 15 '16 at 17:32

2 Answers2

5

The used Regex is not correct for valid IBANs. Use this Regex instead

 [a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}

FR76 30003 02420 002202XXXXX 77

PT50 0002 0123 1234 5678 9015 4

Source: http://snipplr.com/view/15322/iban-regex-all-ibans/

For more information about IBAN format:

Wikipedia

https://en.wikipedia.org/wiki/International_Bank_Account_Number

Edit:

For a more complex validator, check this code https://jsfiddle.net/kf332bhj/1/

To handle spaces, detect country and define the regex based on the char length

Check the latest IBAN standards from SWIFT

https://www.swift.com/standards

https://www.swift.com/standards/data-standards/iban

var CODE_LENGTHS = {
            AD: 24, AE: 23, AT: 20, AZ: 28, BA: 20, BE: 16, BG: 22, BH: 22, BR: 29,
            CH: 21, CR: 21, CY: 28, CZ: 24, DE: 22, DK: 18, DO: 28, EE: 20, ES: 24,
            FI: 18, FO: 18, FR: 27, GB: 22, GI: 23, GL: 18, GR: 27, GT: 28, HR: 21,
            HU: 28, IE: 22, IL: 23, IS: 26, IT: 27, JO: 30, KW: 30, KZ: 20, LB: 28,
            LI: 21, LT: 20, LU: 20, LV: 21, MC: 27, MD: 24, ME: 22, MK: 19, MR: 27,
            MT: 31, MU: 30, NL: 18, NO: 15, PK: 24, PL: 28, PS: 29, PT: 25, QA: 29,
            RO: 24, RS: 22, SA: 24, SE: 24, SI: 19, SK: 24, SM: 27, TN: 24, TR: 26
        };

Edit 2:

To answer Ingo Leonhardt, check IBAN Registry (PDF) in https://www.swift.com/standards/data-standards/iban

a) Norway has the minimum BBAN of 11

b) For IBANs, 9-15 only has to be numeric ([0-9]{7}). For example, KZ86 125K ZT50 0410 0100 is valid for Kazakhstan

Sully
  • 14,672
  • 5
  • 54
  • 79
  • Again, the code provided is a validator, not a detector (so how to find country code without detect IBAN pattern in a whole text...) – Aure77 Nov 15 '16 at 21:22
  • It has to be programmatic I believe. I will sketch a method later today when I have time. – Sully Nov 16 '16 at 06:16
  • im puzzled by the part '[0-9]{7}' and even in the link you've provided there's a comment that says there are exceptions. Do you have a source that says that a) all BBANs must have at lease 11 characters and b) characters 5 to eleven must be numeric? – Ingo Leonhardt Mar 03 '17 at 15:22
  • @IngoLeonhardt, check Edit 2. – Sully Mar 04 '17 at 21:42
0

It'd be reasonable to split the solution into 2 steps:

  1. Fetch the suspects in the text;
  2. Parse and (optionally) validate each suspect.

Here is the implementation of such approach. We'll need the list of IBAN lengths for each country. In my case the list of countries was very limited so I'm using the one from the @Sully's answer here. It's strongly advised to get the most recent list though as more countries may join.

const CODE_LENGTHS = {
    AD: 24, AE: 23, AT: 20, AZ: 28, BA: 20, BE: 16, BG: 22, BH: 22, BR: 29,
    CH: 21, CR: 21, CY: 28, CZ: 24, DE: 22, DK: 18, DO: 28, EE: 20, ES: 24,
    FI: 18, FO: 18, FR: 27, GB: 22, GI: 23, GL: 18, GR: 27, GT: 28, HR: 21,
    HU: 28, IE: 22, IL: 23, IS: 26, IT: 27, JO: 30, KW: 30, KZ: 20, LB: 28,
    LI: 21, LT: 20, LU: 20, LV: 21, MC: 27, MD: 24, ME: 22, MK: 19, MR: 27,
    MT: 31, MU: 30, NL: 18, NO: 15, PK: 24, PL: 28, PS: 29, PT: 25, QA: 29,
    RO: 24, RS: 22, SA: 24, SE: 24, SI: 19, SK: 24, SM: 27, TN: 24, TR: 26
};

// We use a regexp to find the suspects.
// IBAN starts from a 2-character country code followed by the checksum.
// So 4 symbols + the country specific number.
// The longest IBAN in this case is 31 symbols
// so the IBAN 'tail' is 31 - 4 = 27 characters. 
const MAX_LENGTH = Math.max(...Object.values(CODE_LENGTHS)) - 4;

// Let's say we want to highlight all the IBANs in the text.
const HighlightIBAN = text => text.replace(
    // Here is the magic. We search 2 letters followed by 2 digits
    // and followed by 27 meaningful characters.
    // The group (?:\W*[a-z0-9]) allows to ignore spaces, dots,
    // dashes or whatever a user may put as number group separators.
    new RegExp('[a-z]{2}[0-9]{2}(?:\\W*[a-z0-9]){0,' + MAX_LENGTH + '}', 'ig'),
    suspect => {
        // Here we have our suspect.
        // Just need to check if the IBAN is a IBAN indeed.
        const country = suspect.substr(0, 2).toUpperCase();
        const length = CODE_LENGTHS[country];

        // If the first 2 letters isn't a country code (not in the list)
        // we ignore the string found by the regex
        if(!length) return suspect;

        // Now we check if the finding contains the exact number
        // of meaningful symbols and extract the IBAN
        // using the same approach to ignore formatting
        const checkRegexp = new RegExp('^((?:\\W*[a-z0-9]){' + length + '})(.*)$', 'i');
        const parts = checkRegexp.exec(suspect);

        // Nothing extracted. The suitable sequence of symbols not found.
        // So we ignore the suspect again.
        if(!parts) return suspect;
        const iban = parts[1];
        const tail = parts[2];

        // We have our IBAN here.
        // The 'tail' is the symbols which follow the IBAN.
        // As we always pick 31 characters the string found
        // will have several symbols which shouldn't be touched at the end.

        // The extracted IBAN can be validated here (e.g. checksum and stuff)
        // like @Sully suggested.
        // Then we do our job. E.g. we can highlight the IBAN found.
        // Just don't forget to put the 'tail' back.
        return `<span class="iban">${iban}</span>${tail}`;
    }
)

The example usage can be found here: https://jsfiddle.net/qkwx5ja7/1

Sergey
  • 83
  • 1
  • 8