2

I am web-scraping with Python and using BeutifulSoup library

I have HTML markup like this:

<tr class="deals" data-url="www.example2.com">
<span class="hotel-name">
<a href="www.example2.com"></a>
</span>
</tr>
<tr class="deals" data-url="www.example3.com">
<span class="hotel-name">
<a href="www.example3.com"></a>
</span>
</tr>

I want to get the data-url or the href value in all <tr>s. Better If I can get href value

Here is a little snippet of my relevant code:

main_url =  "http://localhost/test.htm"
page  = requests.get(main_url).text
soup_expatistan = BeautifulSoup(page)

print (soup_expatistan.select("tr.deals").data-url)
# or  print (soup_expatistan.select("tr.deals").["data-url"])
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146

1 Answers1

4

You can use tr.deals span.hotel-name a CSS Selector to get to the link:

from bs4 import BeautifulSoup

data = """
<tr class="deals" data-url="www.example.com">
<span class="hotel-name">
<a href="wwwexample2.com"></a>
</span>
</tr>
"""

soup = BeautifulSoup(data)
print(soup.select('tr.deals span.hotel-name a')[0]['href'])

Prints:

wwwexample2.com

If you have multiple links, iterate over them:

for link in soup.select('tr.deals span.hotel-name a'):
    print(link['href'])
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    This works ... Actually I have multiple ``s in my Markup ... How do I iterate over all `tr`s and find the `a` link? – Umair Ayub Nov 07 '14 at 14:31
  • http://stackoverflow.com/questions/26806808/charmap-codec-cant-encode-character-xae-while-scraping-a-webpage – Umair Ayub Nov 07 '14 at 17:31