1

I am trying to get the Name and Numbers from a string that looks like:

string = '><span>Name</span></p><div class="info"><span>100 years old<'

The thing is that the following pattern is not getting all numbers:

re.findall('<span>([a-zA-Z]+)</span>(.*)([0-9]+)',string)

Instead it returns the last numbers from the group of numbers (from the example above '0')

[('Name','</p><div class="info"><span>10','0')]

I want it to return [('Name','</p><div class="info"><span>','100')]


I know that I can do the following to get it working.

re.findall('<span>([a-zA-Z]+)</span>(.*)>([0-9]+)',string)

But, why is the first regex not getting all numbers?

zurfyx
  • 31,043
  • 20
  • 111
  • 145

2 Answers2

3

.* is greedy by default - changing that selector to .*? results in a non-greedy matcher:

>>> re.findall('<span>([a-zA-Z]+)</span>(.*?)([0-9]+)',string)
[('Name', '</p><div class="info"><span>', '100')]
Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
1

Because the "." is getting some of the numbers.

You can try this instread

"([a-zA-Z]+)(\\D*)([\\d]+)"

NOTE : I do not know if you need to escape "\".

Logan Murphy
  • 6,120
  • 3
  • 24
  • 42