How can I match the price in this string by python Regex?

Question

How can I match the price in this string?

    <div id="price_amount" itemprop="price" class="h1 text-special">
      $58
    </div>

I want the $58 in this string, how to do that? This is what I am tring, but doesn't work:

    regex = r'<div id="price_amount" itemprop="price" class="h1 text-special">(.+?)</div>'
    price = re.findall(regex, string)

Refer to the answer [here](http://stackoverflow.com/questions/849912/python-regex-how-to-find-a-string-between-two-sets-of-strings) — Sawal Maskey, Jun 11 '14 at 06:07

score 2 · Accepted Answer · answered Jun 11 '14 at 06:02

2

You really should not use regex for this particular problem. Look into an XML/HTML parsing library for Python instead.

Having said that, your regex is just missing a match for the newlines, so you need to add \s* after the opening tag and before the closing tag.

import re

string="""
    <div id="price_amount" itemprop="price" class="h1 text-special">
      $58
    </div>
    """
regex = r'<div id="price_amount" itemprop="price" class="h1 text-special">\s*(.+?)\s*</div>'
price = re.findall(regex, string)
print price

answered Jun 11 '14 at 06:02

merlin2011

71,677
44
195
329

You might want to use the non-greedy versions – thefourtheye Jun 11 '14 at 06:04
@thefourtheye, Actually, then you would match a whole bunch of extra whiteespace inside the capture, which I assume the OP doesn't want. – merlin2011 Jun 11 '14 at 06:08
The reason to use XML/HTML parsing, is it more accurate and fast? – Liao Zhuodi Jun 11 '14 at 06:27
@liaozd, It is faster, more reliable, and generally not as much of a headache. Regular expressions were not designed to parse XML. – merlin2011 Jun 11 '14 at 06:30
@liaozd, [Here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) is some entertaining and also good reference material on the matter. – merlin2011 Jun 11 '14 at 06:31

Avinash Raj · Answer 2 · 2014-06-11T06:44:08.607

Try to capture only the price which was inbetween <div></div> tags,

import re
str=('<div id="price_amount" itemprop="price" class="h1 text-special">'
     '$58'
     '</div>')
regex = r'<div id="price_amount" itemprop="price" class="h1 text-special">([^<]*?)</div>'
price= re.search(regex, str)
price.group(1) # => '$58'

([^<]*?) this code will catch any character not of < zero or more times and stores the captured character into a group(group1).? followed by * means a non-greedy match.

How can I match the price in this string by python Regex?

2 Answers2