Regex returning nothing in Python

Question

I'm working in Python for the first time and I've used Mechanize to search a website along with BeautifulSoup to select a particular div, now I'm trying to grab a specific sentence with a regular expression. This is the soup object's contents;

    <div id="results">
   <table cellspacing="0" width="100%">
     <tr>
       <th align="left" valign="middle" width="32%">Physician Name, (CPSO#)</th>
       <th align="left" valign="middle" width="36%">Primary Practice Location</th>
       <!-- <th width="16%" align="center" valign="middle">Accepting New Patients?</th> --> 
       <th align="center" valign="middle" width="32%">Disciplinary Info  &amp; Restrictions</th>
     </tr>

    <tr>
        <td>
            <a class="doctor" href="details.aspx?view=1&amp;id= 85956">Hull, Christopher Merritt </a> (#85956)
        </td>
        <td>Four Counties Medical Clinic<br/>1824 Concessions Dr<br/>Newbury ON  N0L 1Z0<br/>Phone: (519) 693-0350<br/>Fax: (519) 693-0083</td>
        <!-- <td></td> --> 
        <td align="center"></td>
    </tr>
  </table>
</div>

(Thank you for the assistance with formatting)

My regular expression to get the text "Hull, Christopher Merritt" is;

patFinderName = re.compile('<a class="doctor" href="details.aspx?view=1&amp;id= 85956">(.*) </a>')

It keeps returning empty and I can't figure out why, anybody have any ideas?

Thank you for the answers, I've changed it to;

patFinderName = re.compile('<a class="doctor" href=".*">(.*) </a>')

Now it works beautifully.

possible duplicate of [Python regular expression for HTML parsing (BeautifulSoup)](http://stackoverflow.com/q/55391/), [Python Reg Ex. problem](http://stackoverflow.com/q/90052/), and probably [others](http://stackoverflow.com/search?q=%2Bbeautifulsoup+%2Bfind+%2Belement&submit=search) — outis, Jun 12 '12 at 05:36

score 3 · Accepted Answer · answered Jun 12 '12 at 01:39

3

? is a magic token in regular expressions, meaning zero or one of the previous atom. As you want a literal question mark symbol, you need to escape it.

answered Jun 12 '12 at 01:39

Chris Morgan

86,207
24
208
215

Ah, I had no idea. Thank you, I'm new to regular expressions and something like that hadn't even crossed my mind. – user1094705 Jun 12 '12 at 01:40

score 0 · Answer 2 · answered Jun 12 '12 at 01:39

0

You should escape the ? in your regex:

In [8]: re.findall('<a class="doctor" href="details.aspx\?view=1&amp;id= 85956">(.*)</a>', text)
Out[8]: ['Hull, Christopher Merritt ']

answered Jun 12 '12 at 01:39

satoru

31,822
31
91
141

Both answers were great but he responded first, sorry. Though thank you for the formatting help. – user1094705 Jun 12 '12 at 01:53
@user1094705 Yes, I was editing your post while others answering your question. – satoru Jun 12 '12 at 02:54

Regex returning nothing in Python

2 Answers2