Get value of attribute using CSS Selectors with BeutifulSoup

Question

I am web-scraping with Python and using BeutifulSoup library

I have HTML markup like this:

<tr class="deals" data-url="www.example2.com">
<span class="hotel-name">
<a href="www.example2.com"></a>
</span>
</tr>
<tr class="deals" data-url="www.example3.com">
<span class="hotel-name">
<a href="www.example3.com"></a>
</span>
</tr>

I want to get the data-url or the href value in all <tr>s. Better If I can get href value

Here is a little snippet of my relevant code:

main_url =  "http://localhost/test.htm"
page  = requests.get(main_url).text
soup_expatistan = BeautifulSoup(page)

print (soup_expatistan.select("tr.deals").data-url)
# or  print (soup_expatistan.select("tr.deals").["data-url"])

alecxe · Accepted Answer · 2014-11-07T14:33:09.147

4

You can use tr.deals span.hotel-name a CSS Selector to get to the link:

from bs4 import BeautifulSoup

data = """
<tr class="deals" data-url="www.example.com">
<span class="hotel-name">
<a href="wwwexample2.com"></a>
</span>
</tr>
"""

soup = BeautifulSoup(data)
print(soup.select('tr.deals span.hotel-name a')[0]['href'])

Prints:

wwwexample2.com

If you have multiple links, iterate over them:

for link in soup.select('tr.deals span.hotel-name a'):
    print(link['href'])

edited Nov 07 '14 at 14:33

answered Nov 07 '14 at 14:22

alecxe

462,703
120
1,088
1,195

1

This works ... Actually I have multiple ``s in my Markup ... How do I iterate over all `tr`s and find the `a` link? – Umair Ayub Nov 07 '14 at 14:31
http://stackoverflow.com/questions/26806808/charmap-codec-cant-encode-character-xae-while-scraping-a-webpage – Umair Ayub Nov 07 '14 at 17:31

Get value of attribute using CSS Selectors with BeutifulSoup

1 Answers1

Linked