how to check if a beautiful soup object contains numbers

Question

I try to scrape a page and I have problems to check if the one beautifulsoup element contains numbers. I would like to clean the string, if it contains numbers. In this case, i would like just to keep the number, which is a zipcode. But before I clean it, I have to check, if the element even has a zipcode.

I search the element with following code:

soup.find("span",{"class": "locality"}).get_text()
Output: 68549 Ilvesheim, Baden-Württemberg,

I tried to check the string with following code, but it always says "False"

soup.find("span",{"class": "locality"}).get_text()).isalnum()
soup.find("span",{"class": "locality"}).get_text()).isdigit()

is there another way to check it? Since it contains "68549" it should say TRUE

Possible duplicate of [check if a string contains a number](https://stackoverflow.com/questions/19859282/check-if-a-string-contains-a-number) — Mark, Mar 10 '18 at 18:18

user3483203 · Answer 1 · 2018-03-10T18:42:47.670

You could use this simple function to check if a string contains numbers:

def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

But I think this is an XY problem, and what you are really looking for is a regex to extract a zip code, check out the following:

\s(\d+)\s (You may have to change this up depending on the acceptable forms of a zip code)

>>> s = 'Output: 68549 Ilvesheim, Baden-Württemberg,'
>>> re.findall(r'\s(\d+)\s', s)
['68549']

If the string does not contain a zip code, you can check for this by just making sure the length of the result re.findall() is 0:

>>> re.findall(r'\s(\d+)\s', 'No zip code here!')
[]

score 0 · Answer 2 · answered Mar 10 '18 at 18:22

0

Using Regex:

import re
hasnumber = re.findall(r'\d+', "68549 Ilvesheim, Baden-Württemberg")
if hasnumber:
    print(hasnumber)

Output:

['68549']

answered Mar 10 '18 at 18:22

Rakesh

81,458
17
76
113

score 0 · Answer 3 · answered Mar 10 '18 at 19:37

If you are looking for zip codes, you might want to consider the valid ranges. For example German ZIP codes are exactly 5 digits in length:

import re

for test in ['68549 Ilvesheim, Baden-Württemberg', 'test 01234', 'test 2 123456789', 'inside (56089)']:
    if len(re.findall(r'\b\d{5}\b', test)):
        print "'{}' has zipcode".format(test)

So for these three examples, the third test does not match as a zip code:

'68549 Ilvesheim, Baden-Württemberg' has zipcode
'test 01234' has zipcode
'inside (56089)' has zipcode

The {5} tells the regex to match exactly 5 digits with \b ensuring a word boundary either side. If you want five or size digits, use {5,6}

how to check if a beautiful soup object contains numbers

3 Answers3