1

I am trying to capture a mobile number without any country code in the match. As far as I know, this is only possible by using lookaround assertions.

m="919876543210"
re.match(r"^(?<=91)[0-9]+", m)

But there is no match at all. Can someone kindly point out the mistake here?

EDIT:

The string can have number with or without country code. (Assume country code can only be 91) so,

m = "91xxxxxxxxxx" 
m = "xxxxxxxxxx"

The problem is if I use an optional group ( regex = r"(91)?\d+" ), then the country code is included in the match. How can we handle both the cases without including the country code in the result?

schwillr
  • 41
  • 4
  • Please provide the definition of `without any country code` ... what does that mean? – Tim Biegeleisen Jun 11 '20 at 06:50
  • If its just mobile number to be extracted excluding country code, why not use string slicing ``"919876543210"[-10:]``'' – sushanth Jun 11 '20 at 06:52
  • 1
    `^(?<=91)` asserts that the match starts at the start of the input, *and* that the match is preceded by the characters `91`. How can the start of the input be preceded by `91`? – user2357112 Jun 11 '20 at 06:54
  • yes, there can be numbers without country code, or even multiple numbers in a single string. But my question is more about why the regex in that particular example is not working. – schwillr Jun 11 '20 at 07:45
  • Do you have more sample data, eg. where more phonenumbers can be in a single string? I think the comments in here and by @gustavrasmussen tells you why `re.match` won't work. – JvdV Jun 11 '20 at 07:52

1 Answers1

2

You can search for multiple country code using positive look-behinds with the following pattern

(I included the Danish country code as well):

import re

phone_numbers = ["919876543210",
                 "9876543210",
                 "455476543210"
                 ]


def trim_country_code(phone_num: str):
    """Remove country codes from phone numbers if they have len 12.
    otherwise just return phone number."""

    if len(phone_num) == 12:
        regex = re.compile(r"(?<=91|45)\d+")
        res = re.search(regex, phone_num)
        return res.group()
    return phone_num


for phone_number in phone_numbers:
    print(trim_country_code(phone_number))

Returns:

9876543210
9876543210
5476543210

But runs into validity issues with danish number which are of length 8. So simpler and more general approach (without need of regex) could be:

phone_numbers = ["919876543210",
                 "9876543210",
                 "4554765432"
                 ]


def trim_first_two(phone_num: str):
    if phone_num.startswith(("45", "91")):
        return phone_num[2:]
    return phone_num


for phone_number in phone_numbers:
    print(trim_first_two(phone_number))

Returning:

9876543210
9876543210
54765432
Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53
  • Thanks! Turns out I only needed to use search instead of match – schwillr Jun 11 '20 at 07:38
  • match is the same as search, just corresponding to having a "^" in the beginning of the search regex/pattern. Therefore I recommend just using search instead. In my opinion, match is a redundant method in the Python re library. – Gustav Rasmussen Jun 11 '20 at 07:39
  • How can we make it work for both m="919876543210" and m="9876543210" (i.e with or without country code| assume country code can only be 91).....The issue is if I use an optional group using '?', the match includes the country code 91. – schwillr Jun 11 '20 at 08:10
  • 1
    match and search are different in other non-obvious ways too. Refer - https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match – schwillr Jun 11 '20 at 08:25
  • 1
    @schwillr see the updated answer. And thanks for the link, I had been looking for a good reason why the match method exists in the re module :) – Gustav Rasmussen Jun 11 '20 at 08:28