4

I need your help: I need to find all phone numbers in a passage of text, so I need to match different number formats, e.g: +420 123 123 123, 123 123 123, +420123123123 and/or 123123123.

If I use a regex pattern with the search method it works perfectly, but if I use findall method it just returns a list of whitespaces.

import re

def detect_numbers(text):
    phone_regex = re.compile(r"(\+420)?(\s*)?\d{3}(\s*)?\d{3}(\s*)?\d{3}")
    print(phone_regex.findall(text))
Scott Anderson
  • 631
  • 5
  • 26
QClodd
  • 95
  • 1
  • 1
  • 10
  • do the longer numbers always start with 420? – depperm May 29 '18 at 14:24
  • 1
    https://docs.python.org/3/library/re.html#re.findall Findall returns lists of tuples, with each tuple representing the groups from one match. You are grouping the whitespaces but you're not grouping the actual digits. – Tom Dalton May 29 '18 at 14:26
  • Change your groups into non-capturing groups. (?:\+420)? (?:\s*)? etc. Or capture the digits not white spaces. – Taku May 29 '18 at 14:27

4 Answers4

3

https://docs.python.org/3/library/re.html#re.findall

Findall returns lists of tuples, with each tuple representing the groups from one match. You are grouping the whitespaces but you're not grouping the actual digits.

Try a regex that groups the digits too:

r"(\+420)?(\s*)?(\d{3})(\s*)?\(d{3})(\s*)?\(d{3})"

E.g.

def detect_numbers(text):
    phone_regex = re.compile(r"(\+420)?\s*?(\d{3})\s*?(\d{3})\s*?(\d{3})")
    print(phone_regex.findall(text))

detect_numbers("so I need to match +420 123 123 123, also 123 123 123, also +420123123123 and also 123123123. Can y")

prints:

[('+420', '123', '123', '123'), ('', '123', '123', '123'), ('+420', '123', '123', '123'), ('', '123', '123', '123')]

You could then string-join the group matches to get the numbers, e.g.

def detect_numbers(text):
    phone_regex = re.compile(r"(\+420)?\s*?(\d{3})\s*?(\d{3})\s*?(\d{3})")
    groups = phone_regex.findall(text)
    for g in groups:
        print("".join(g))

detect_numbers("so I need to match +420 123 123 123, also 123 123 123, also +420123123123 and also 123123123. Can y")

prints:

+420123123123
123123123
+420123123123
123123123
Tom Dalton
  • 6,122
  • 24
  • 35
  • 1
    There's a difference between matching, and grouping. `r"This (is a) test"``Will *match* "This is a test", and there will be one subgroup containing "is a". – Tom Dalton May 29 '18 at 14:32
2

Try regex like below for mobile number contains numbers

"/[^0-9 +\-]/"

If you want to starts with some particular number user like below

preg_match('\+420\d{9}/', mobilenumber)
somesh
  • 528
  • 2
  • 10
  • 26
1

Let's assume your text is relatively well behaved. Then a simple pattern could be to recover all sequence of at least nine digits, spaces and - optionally preceeded by a + with re.findall.

Unless your text contains some weird artifacts or arithmetic operations, this should do the trick. Furthermore, being loose on the format will allow to find numbers that contain format errors.

import re

def find_phone_numbers(text):
    phones = re.findall('(?:\+ *)?\d[\d\- ]{7,}\d', text)
    return [phone.replace('-', '').replace(' ', '') for phone in phones]

Example:

text = "My phone numbers are 123123123, +234-123-3231 and + 555 123 1234"

print(find_phone_numbers(text)) # ['123123123', '+2341233231', '+5551231234']
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
0

This is because findall only returns non-overlapping matches, whereas search returns you the first match.

findall

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

You can either use search or you can change your regular expression to

^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$

taken from this post.

vasia
  • 1,093
  • 7
  • 18