-3

Hi I have trouble with regex.

This is some source:

    <div class="resultHeader googleHeader">
                            Wyniki z Google
                    </div>

                <div class="boxResult2  ">
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://www.google.com/glass/start/"><b>Google Glass</b></a> </div>
                    <div class="source">
                        http://www.google.com/glass/start/

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fwww.google.com%2Fglass%2Fstart%2F">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc">Thanks for exploring with us. The journey doesn&#39;t end here. You&#39;ll start to see <br />
future versions of <b>Glass</b> when they&#39;re ready (for now, no peeking).</div>
                                    </div><!-- result End -->
            </div><!-- box End -->
                                                                <div class="box ">
                <div class="result">
                    <div class="link"> <a href="http://pl.wikipedia.org/wiki/Google_Glass"><b>Google Glass</b> – Wikipedia, wolna encyklopedia</a> </div>
                    <div class="source">
                        http://pl.wikipedia.org/wiki/Google_Glass

                            - <a rel="nofollow" href="query.html?hl=pl&amp;qt=related:http%3A%2F%2Fpl.wikipedia.org%2Fwiki%2FGoogle_Glass">Podobne strony</a>
                                            </div><!-- source END -->
                                            <div class="desc"><b>Google Glass</b> to okulary o rozszerzonej rzeczywistości stworzone przez firmę <br />
Google. Okulary te mają docelowo mieć funkcje standardowego smartfona, ale&nbsp;...</div>
                                    </div><!-- result End -->
            </div><!-- box End -->

And I want just link between <a href=" and "> - like this:

http://www.google.com/glass/start/

I wrote this.. '<div class="link"> <a href="([^ ]+)"' but isn't working.. :(

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
devbgs
  • 67
  • 2
  • 7

1 Answers1

3

Since you are coding this in Python, I can suggest a Beautiful Soup based solution.

from bs4 import BeautifulSoup
html = 'YOUR STRING'
soup = BeautifulSoup(html)
divs = soup.find_all("div", {"class":"link"})

for tag in divs:
    a = tag.find_all("a")
    for t in a:
        if t.has_attr('href'):
            print t['href']

Based on your sample input, this outputs:

http://www.google.com/glass/start/
http://pl.wikipedia.org/wiki/Google_Glass
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563