-2

I have following problem: I have some HTML code and I need to get tag content. I don't want to use millions of substring or something like that. I want to use regex but I am having problem with filtering tags with classes, ids,... or without anything. Here's my regex:

match = re.search('(?<=<span(.+)?>)(.*)(?=</span>)', '<span class="red">color</span>')

Python throws following error

sre_constants.error: look-behind requires fixed-width pattern

I want to get content from

<span class="red">color</span>

and from

<span>color</span>

Thanks everyone from help!

Václav Pavlíček
  • 419
  • 2
  • 9
  • 21

1 Answers1

1

The simple answer: Use findall, skip the look-behind and get the capture group.

<span(.+)?>(.*?)</span>

But this will fail in many cases. E.g. nested tags, a string containing the text </span>, and so on...

SamWhan
  • 8,296
  • 1
  • 18
  • 45