-1

I was playing around with pattern matches in different html codes of sites I noticed something weird. I used this pattern :

pat = <div class="id-app-orig-desc">.*</div>

I used it on a app page of the play store(Picked a random app). So according to me it should just give what's between the div tags (ie the description) but that does not happen. I gives everything starting from the first of the pattern and goes on till the last of the page completely ignoring in between. Anyone knows what's happening?!

And I check the length of the list returned it's just 1.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Ishan Garg
  • 178
  • 2
  • 11

1 Answers1

0

First of all, do not parse HTML with regex, use a specialized tool - HTML parser. For example, BeautifulSoup:

from bs4 import BeautifulSoup

data = """
<div>
    <div class="id-app-orig-desc">
        Do not try to get me with a regex, please.
    </div>
</div>
"""

soup = BeautifulSoup(data)
print soup.find('div', {'class': 'id-app-orig-desc'}).text.strip()

Prints:

Do not try to get me with a regex, please.
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195