0

I try to scrape a page and I have problems to check if the one beautifulsoup element contains numbers. I would like to clean the string, if it contains numbers. In this case, i would like just to keep the number, which is a zipcode. But before I clean it, I have to check, if the element even has a zipcode.

I search the element with following code:

soup.find("span",{"class": "locality"}).get_text()
Output: 68549 Ilvesheim, Baden-Württemberg, 

I tried to check the string with following code, but it always says "False"

soup.find("span",{"class": "locality"}).get_text()).isalnum()
soup.find("span",{"class": "locality"}).get_text()).isdigit()

is there another way to check it? Since it contains "68549" it should say TRUE

Nika
  • 145
  • 1
  • 13
  • Possible duplicate of [check if a string contains a number](https://stackoverflow.com/questions/19859282/check-if-a-string-contains-a-number) – Mark Mar 10 '18 at 18:18

3 Answers3

2

You could use this simple function to check if a string contains numbers:

def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

But I think this is an XY problem, and what you are really looking for is a regex to extract a zip code, check out the following:

\s(\d+)\s (You may have to change this up depending on the acceptable forms of a zip code)

>>> s = 'Output: 68549 Ilvesheim, Baden-Württemberg,'
>>> re.findall(r'\s(\d+)\s', s)
['68549']

If the string does not contain a zip code, you can check for this by just making sure the length of the result re.findall() is 0:

>>> re.findall(r'\s(\d+)\s', 'No zip code here!')
[]
user3483203
  • 50,081
  • 9
  • 65
  • 94
0

Using Regex:

import re
hasnumber = re.findall(r'\d+', "68549 Ilvesheim, Baden-Württemberg")
if hasnumber:
    print(hasnumber)

Output:

['68549']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

If you are looking for zip codes, you might want to consider the valid ranges. For example German ZIP codes are exactly 5 digits in length:

import re

for test in ['68549 Ilvesheim, Baden-Württemberg', 'test 01234', 'test 2 123456789', 'inside (56089)']:
    if len(re.findall(r'\b\d{5}\b', test)):
        print "'{}' has zipcode".format(test)

So for these three examples, the third test does not match as a zip code:

'68549 Ilvesheim, Baden-Württemberg' has zipcode
'test 01234' has zipcode
'inside (56089)' has zipcode

The {5} tells the regex to match exactly 5 digits with \b ensuring a word boundary either side. If you want five or size digits, use {5,6}

Martin Evans
  • 45,791
  • 17
  • 81
  • 97