Remove all whitespaces

Question

I have a regular expression that searches for a special class and outputs a tag.

(?<=<div\ class="value.*?">\s+).*?(?=\s+</div>)

The problem is that it leaves whitespaces at the beginning of the tag

Example:

<div class="value odd">          THIS IS MY TAG                 </div>

For now my expression remove the whitespaces only after the tag, but not at the beginning.

How I can remove it at the beginning?

I need to get only: THIS IS MY TAG

Why not use an HTML parser like `html.parser` or `lxml`, or a metaparser like BeautifulSoup, to get the tag contents as a string, then just `strip()` that string? — abarnert, Aug 25 '18 at 22:37
Rather than `(?<=A\s+).*?(?=\s+B)` use `A\s*(.*?)\s*B`, but since it is HTML, there are better ways to handle this kind of input. — Wiktor Stribiżew, Aug 25 '18 at 22:38
For example: `soup = bs4.BeautifulSoup(text)`, then `div = soup.find('div', class_=('value', 'odd'))`, then `text = div.text.strip()`. — abarnert, Aug 25 '18 at 22:40
I know that this can be done much easier, but in my task I need to use only the regular expression:( — Den Andreychuk, Aug 25 '18 at 22:44
You regex cannot even be compiled: `sre_constants.error: look-behind requires fixed-width pattern`. — DYZ, Aug 25 '18 at 23:01
Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Daniel Pryden, Aug 25 '18 at 23:05
The link that you provided validates regular expressions for JavaScript and PHP but not Python. — DYZ, Aug 25 '18 at 23:06

score -1 · Answer 1 · answered Aug 25 '18 at 23:03

-1

You do not need any look-aheads or look-behinds. A correct regex is:

'<div class="value.*?">\s+(.*?)\s+</div>'

answered Aug 25 '18 at 23:03

DYZ

1 Answers1