0

I'm using Python 3.7. I'm having difficulty extractng a number from teh beginning of a string. The string is derived from an HTML element, like so

elt.text
'3 reviews'

However, when I try and get the number using logic here -- Extract Number from String in Python , I get the error below

int(filter(str.isdigit, elt.text))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'filter'

Is there a better way to get the number from the beginning of the string?

Dave
  • 15,639
  • 133
  • 442
  • 830
  • `filter` returns an iterator, not just the number you are looking for. You cannot cast a `filter` to an `int`. The error is saying `int() argument must not be a 'filter' object` – Bill S. Jan 04 '19 at 21:54

4 Answers4

3

As the comments on that answer note, in Python 3, filter returns a filter generator object, so you must iterate over it and build a new string before you can call int:

>>> s = '3 reviews'
>>> filter(str.isdigit, s)
<filter object at 0x800ad5f98>
>>> int(''.join(filter(str.isdigit, s)))
3

However, as other answers in that same thread point out, this is not necessarily a good way to do the job at all:

>>> s = '3 reviews in 12 hours'
>>> int(''.join(filter(str.isdigit, s)))
312

It might be better to use a regular expression matcher to find the number at the front of the string. You can then decide whether to allow signs (+ and -) and leading white-space:

>>> import re
>>> m = re.match(r'\s*([-+])?\d+', s)
>>> m
<_sre.SRE_Match object; span=(0, 1), match='3'>
>>> m.group()
'3'
>>> int(m.group())
3

Now if your string contains a malformed number, m will be None, and if it contains a sign, the sign is allowed:

>>> m = re.match(r'\s*([-+])?\d+', 'not a number')
>>> print(m)
None
>>> m = re.match(r'\s*([-+])?\d+', '  -42')
>>> m
<_sre.SRE_Match object; span=(0, 5), match='  -42'>
>>> int(m.group())
-42

If you wish to inspect what came after the number, if anything, add more to the regular expression (including some parentheses for grouping) and use m.group(1) to get the matched number. Replace \d+ with \d* to allow an empty number-match, if that's meaningful (but then be mindful of matching a lone - or + sign, if you still allow signs).

torek
  • 448,244
  • 59
  • 642
  • 775
0

The easiest way if the number is always at the beginning of the string, given it's a single digit:

number = int(elt.text[0])

Or for more than one digit:

number = int(elt.text.split()[0])
user2379875
  • 153
  • 6
  • This would work assuming the character after the number is a space. It probably is but this wouldn't work in a case like `elt.text = "34x"` – Dan Jan 04 '19 at 21:58
  • True, just giving a simple approach if the format is expected to be the same in all cases. – user2379875 Jan 04 '19 at 21:59
0

You can amend the top answer in the link you send to this:

str1 = "3158 is a great number"
print(int("".join(filter(str.isdigit, str1))))
#3158

As to why the answer doesn't work now, I'm not sure.

Dan
  • 527
  • 4
  • 16
0

there's a more intuitive way to do it. I'll make an assumption and think that there's a posibility that in a given string more than one number will appear. So, you want to iterate the words of the input.

numbers = [int(s) for s in input_string.split(' ') if s.isdigit()]

The first element of the list is the first number found on the given string, it is available by taking it out of the list numbers[0].

If you are certain and there's not a chance that the first 'element' of the input string isn't anything else but a number, you can just split the string by spaces (or the separator you are using) and cast it to an integer or float.

int(input_string.split(' ')[0]) or float(input_string.split(' ')[0])

If you aren't certain, wrap it into a try and take the response either of the succesful try or the except.