How to use positive lookbehind (optionally) in a mobile number?

Question

I am trying to capture a mobile number without any country code in the match. As far as I know, this is only possible by using lookaround assertions.

m="919876543210"
re.match(r"^(?<=91)[0-9]+", m)

But there is no match at all. Can someone kindly point out the mistake here?

EDIT:

The string can have number with or without country code. (Assume country code can only be 91) so,

m = "91xxxxxxxxxx" 
m = "xxxxxxxxxx"

The problem is if I use an optional group ( regex = r"(91)?\d+" ), then the country code is included in the match. How can we handle both the cases without including the country code in the result?

Please provide the definition of `without any country code` ... what does that mean? — Tim Biegeleisen, Jun 11 '20 at 06:50
If its just mobile number to be extracted excluding country code, why not use string slicing ``"919876543210"[-10:]``'' — sushanth, Jun 11 '20 at 06:52
`^(?<=91)` asserts that the match starts at the start of the input, *and* that the match is preceded by the characters `91`. How can the start of the input be preceded by `91`? — user2357112, Jun 11 '20 at 06:54
yes, there can be numbers without country code, or even multiple numbers in a single string. But my question is more about why the regex in that particular example is not working. — schwillr, Jun 11 '20 at 07:45
Do you have more sample data, eg. where more phonenumbers can be in a single string? I think the comments in here and by @gustavrasmussen tells you why `re.match` won't work. — JvdV, Jun 11 '20 at 07:52

Gustav Rasmussen · Accepted Answer · 2020-06-11T10:34:13.013

2

You can search for multiple country code using positive look-behinds with the following pattern

(I included the Danish country code as well):

import re

phone_numbers = ["919876543210",
                 "9876543210",
                 "455476543210"
                 ]


def trim_country_code(phone_num: str):
    """Remove country codes from phone numbers if they have len 12.
    otherwise just return phone number."""

    if len(phone_num) == 12:
        regex = re.compile(r"(?<=91|45)\d+")
        res = re.search(regex, phone_num)
        return res.group()
    return phone_num


for phone_number in phone_numbers:
    print(trim_country_code(phone_number))

Returns:

9876543210
9876543210
5476543210

But runs into validity issues with danish number which are of length 8. So simpler and more general approach (without need of regex) could be:

phone_numbers = ["919876543210",
                 "9876543210",
                 "4554765432"
                 ]


def trim_first_two(phone_num: str):
    if phone_num.startswith(("45", "91")):
        return phone_num[2:]
    return phone_num


for phone_number in phone_numbers:
    print(trim_first_two(phone_number))

Returning:

9876543210
9876543210
54765432

edited Jun 11 '20 at 10:34

answered Jun 11 '20 at 07:05

Gustav Rasmussen

3,720
4
23
53

Thanks! Turns out I only needed to use search instead of match – schwillr Jun 11 '20 at 07:38
match is the same as search, just corresponding to having a "^" in the beginning of the search regex/pattern. Therefore I recommend just using search instead. In my opinion, match is a redundant method in the Python re library. – Gustav Rasmussen Jun 11 '20 at 07:39
How can we make it work for both m="919876543210" and m="9876543210" (i.e with or without country code| assume country code can only be 91).....The issue is if I use an optional group using '?', the match includes the country code 91. – schwillr Jun 11 '20 at 08:10
1

match and search are different in other non-obvious ways too. Refer - https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match – schwillr Jun 11 '20 at 08:25
1

@schwillr see the updated answer. And thanks for the link, I had been looking for a good reason why the match method exists in the re module :) – Gustav Rasmussen Jun 11 '20 at 08:28

How to use positive lookbehind (optionally) in a mobile number?

EDIT:

1 Answers1