Pattern match not working as expected python

Question

I was playing around with pattern matches in different html codes of sites I noticed something weird. I used this pattern :

pat = <div class="id-app-orig-desc">.*</div>

I used it on a app page of the play store(Picked a random app). So according to me it should just give what's between the div tags (ie the description) but that does not happen. I gives everything starting from the first of the pattern and goes on till the last of the page completely ignoring in between. Anyone knows what's happening?!

And I check the length of the list returned it's just 1.

you need to escape the double quotes and use `s` modifier to make dot to match a new line. — Avinash Raj, Jul 01 '14 at 14:20

score 0 · Accepted Answer · edited May 23 '17 at 12:14

First of all, do not parse HTML with regex, use a specialized tool - HTML parser. For example, BeautifulSoup:

from bs4 import BeautifulSoup

data = """
<div>
    <div class="id-app-orig-desc">
        Do not try to get me with a regex, please.
    </div>
</div>
"""

soup = BeautifulSoup(data)
print soup.find('div', {'class': 'id-app-orig-desc'}).text.strip()

Prints:

Do not try to get me with a regex, please.

Pattern match not working as expected python

1 Answers1