0

I am fairly new to using regex and am trying to find the following text :

<div class="name">

    <a href="/rd/?S=1401191307481569663391991831690328817&I=&DS=42639&T=55&U=http%3A%2F%2Fwww.spokeo.com%2Fmapview%2Fperson%2F18643819031%3Fpx%3D%26piplstart%3D%26q%3DJoe%2BHenderson%2C%2BPhoenix%2C%2BAZ%26g%3Dname_piplv2_scd_city01&P=">
        <span class="highlight"> … </span>

         T 

        <span class="highlight"> … </span>

        , E Flower St, 

        <span class="highlight"> … </span>

        , 

        <span class="highlight"> … </span>

        , 

        <span class="highlight"> … </span>

        , 50 years old

    </a>

</div>
<div class="url">

    www.spokeo.com/mapview/person/18643819031?px=&piplstart=&q=Joe+Hend...

</div>

The expression I came up with is :

("<div class=\"name\">[\S\s]+</div><div class=\"url\">[\S\s]+</div>") 

However no matches are found. Any help is appreciated.

Peter Lazarov
  • 281
  • 2
  • 3
  • 12

1 Answers1

0

You have a new line here:

</div>
<div class="url">

But you don't have one in your regex:

         |
         V
...</div><div...

Try adding \s* there (well assuming \s includes new lines in Python, and the div's actually always follow on one another with nothing but whitespace in between).

But, as already mentioned, using regex to parse HTML is playing with fire.

Community
  • 1
  • 1
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138