3

I'm trying to return and print the ICCID of a SIM card in a device; the SIM cards are from various suppliers and therefore of differing lengths (either 19 or 20 digits). As a result, I'm looking for a regular expression that will extract the ICCID (in a way that's agnostic to non-word characters immediately surrounding it).

Given that an ICCID is specified as a 19-20 digit string starting with "89", I've simply gone for:

(89\d{17,18})

This was the most successful pattern that I'd tested (along with some patterns rejected for reasons below).

In the string that I'm extracting it from, the ICCID is immediately followed by a carriage return and then a line feed, but some testing against terminating it with \r, \n, or even \b failed to work (the program that I'm using is an in-house one built on python, so I suspect that's what it's using for regex). Also, simply using (\d{19,20}) ended up extracting the last 19 digits of a 20-digit ICCID (as the third and last valid match). Along the same lines, I ruled out (\d{19,20})? in principle, as I expect that to finish when it finds the first 19 digits.

So my question is: Should I use the pattern I've chosen, or is there a better expression (not using non-word characters to frame the string) that will return the longest substring of a variable-length string of digits?

Myles
  • 543
  • 8
  • 13
  • It really depends on format of the file/text which you want to parse. I advise you to 'tune' your expression in a regexp tester like https://regex101.com/#python (use "g" mode to simulate searching) – Sergey Belash Sep 23 '16 at 12:53
  • I do not understand why `\d{19,20}` only matches 19 out of 20 chars - the quantifier is greedy. – Wiktor Stribiżew Sep 23 '16 at 13:00
  • @WiktorStribiżew I suspect that it matched the first 19 digits, then all 20 digits, then the last 19 digits. As that's the last match it got, that's the one it returns. – Myles Sep 23 '16 at 14:23
  • Well, if you could provide more details, examples of the text you try your regex against, what exact matches you get, maybe the tool itself or how it works we could provide more specific help. – Wiktor Stribiżew Sep 23 '16 at 14:51

3 Answers3

1

I'd go for

89\d{17,18}[^\d]

This should prefer 18 digits, but 17 would also suffice. After that, no more other numeric characters would be allowed.

Only limitation: there must be at least one more character after the ICCID (which should be okay from what you described).

Be aware that any longer number sequence carrying "89" followed by 17 or 18 numerical characters would also match.

freefall
  • 388
  • 1
  • 14
  • There are so many different solutions to this. But this should work good enough. – freefall Sep 23 '16 at 12:51
  • Note that `Python` offers `\D` as well as `[^\d]` - if you want to allow **zero ore more letters**, I'd go for `\D*`. – Jan Sep 23 '16 at 12:57
1

If the engine behind the scenes is really Python, and there can be any non-digits chars around the value you need to extract, use lookarounds to restrict the context around the values:

(?<!\d)89\d{17,18}(?!\d)
^^^^^^^         ^^^^^^

The (?<!\d) loobehind will require the absense of a digit before the match and (?!\d) negative lookahead will require the absence of a digit after that value.

See this regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
(\d+)\D+ 

seems like it would do the trick readily. (\d+ ) would capture 20 numbers. \D+ would match anything else afterwards.

A_Elric
  • 3,508
  • 13
  • 52
  • 85