-1

I have a regular expression that searches for a special class and outputs a tag.

(?<=<div\ class="value.*?">\s+).*?(?=\s+</div>)

The problem is that it leaves whitespaces at the beginning of the tag

Example:

<div class="value odd">          THIS IS MY TAG                 </div>

For now my expression remove the whitespaces only after the tag, but not at the beginning.

How I can remove it at the beginning?

I need to get only: THIS IS MY TAG

Den Andreychuk
  • 420
  • 4
  • 14
  • 2
    Why not use an HTML parser like `html.parser` or `lxml`, or a metaparser like BeautifulSoup, to get the tag contents as a string, then just `strip()` that string? – abarnert Aug 25 '18 at 22:37
  • 2
    Rather than `(?<=A\s+).*?(?=\s+B)` use `A\s*(.*?)\s*B`, but since it is HTML, there are better ways to handle this kind of input. – Wiktor Stribiżew Aug 25 '18 at 22:38
  • For example: `soup = bs4.BeautifulSoup(text)`, then `div = soup.find('div', class_=('value', 'odd'))`, then `text = div.text.strip()`. – abarnert Aug 25 '18 at 22:40
  • I know that this can be done much easier, but in my task I need to use only the regular expression:( – Den Andreychuk Aug 25 '18 at 22:44
  • You regex cannot even be compiled: `sre_constants.error: look-behind requires fixed-width pattern`. – DYZ Aug 25 '18 at 23:01
  • https://regexr.com/3ufds – Den Andreychuk Aug 25 '18 at 23:04
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Daniel Pryden Aug 25 '18 at 23:05
  • The link that you provided validates regular expressions for JavaScript and PHP but not Python. – DYZ Aug 25 '18 at 23:06

1 Answers1

-1

You do not need any look-aheads or look-behinds. A correct regex is:

'<div class="value.*?">\s+(.*?)\s+</div>'
DYZ
  • 55,249
  • 10
  • 64
  • 93