Web Scrape Second tag in single

Question

I am trying to web scrape using beautifulsoup the first and second tags (-130, and +110) in this single HTML div (as seen below): example HTML

However I can not figure out how to scrape the second tag, can only scrape the first. Thank you.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

day = "09"
month = "10"
year = "2017"
my_url = 'https://www.sportsbookreview.com/betting-odds/mlb-baseball/?date=' + year + month + day

# Opening up the connection and grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parser
page_soup = soup(page_html, "html.parser")

allBovadaOdds = page_soup.find_all("div", {"rel": "999996"})

firstOdds = allBovadaOdds[1].b.string
print(firstOdds)

XPath is your friend. `//div/b[2]` will find the second `b` in each `div`; `//div[@rel="9999996"]/b[2]`, will find the second `b` only in a `div` that has a `rel` attribute with the given value (remove the `@` if you want to look for an element, not an attribute). — Charles Duffy, Apr 05 '18 at 16:17
@CharlesDuffy, maybe I'm missing something here, but, XPath can't be used with BeautifulSoup. — Keyur Potdar, Apr 05 '18 at 16:24
@KeyurPotdar, modern versions of BeautifulSoup by default backend into lxml, which very much does support XPath. — Charles Duffy, Apr 05 '18 at 16:36
Actually, I mean `lxml.html`. See http://lxml.de/elementsoup.html, and https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser — Charles Duffy, Apr 05 '18 at 16:40
Thanks, got it. Sorry for the confusion. I was basing my comment on [this post](https://stackoverflow.com/a/11469854/7832176). — Keyur Potdar, Apr 05 '18 at 16:41

score 2 · Answer 1 · answered Apr 05 '18 at 18:58

What you want can be written fairly simply, I think.

>>> import bs4
>>> import requests
>>> page = requests.get('https://www.sportsbookreview.com/betting-odds/mlb-baseball/?date=20171009').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> soup.select('#eventLine-3330496-43 b')
[<b>-130</b>, <b>+110</b>]
>>> for item in soup.select('#eventLine-3330496-43 b'):
...     item.text
...     
'-130'
'+110'

However, I notice two potential problems:

The labelling of the elements (ie, ids of divs, etc) might vary from one invocation of the web page to the next.
There are actually two columns with this pair of values. It might be safer to identify the required items by using booking agent and number for instance.

score 1 · Answer 2 · answered Apr 05 '18 at 18:35

You may try to use soup.select() filter tags and use for i in range(): to get all of the second tags. Note that the step in range() should be 2.

# html parser
page_soup = soup(page_html, "html.parser")
allBovadaOdds = page_soup.select('div[rel="999996"] b')
print(allBovadaOdds)
for i in range(1,len(allBovadaOdds),2):
    SecondOdds = allBovadaOdds[i].string
    print(SecondOdds)

Web Scrape Second tag in single

2 Answers2