web scraping (football odds)

Question

I am new to web scraping and right now I try to understand it in order to automate a betting competion with friends about the german bundesliga. (The platform we use is kicktipp.de). I already managed to login to the website and post football results with python. Unfortunatelly those are just poisson distributed randoms number so far. To improve this my idea is to download odds from bwin. More precisely I try to download the odds for the exact results. Here the problems start. So far I wasn't able to extract those with BeautifulSoup. Using google chrome I try to understand which part of the html-code I need. But for some reasons I cannot find those parts with BeautifulSoup. My code at the moment does look like this:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://sports.bwin.com/de/sports/4/wetten/fußball#categoryIds=192&eventId=&leagueIds=43&marketGroupId=&page=0&sportId=4&templateIds=0.8649061927316986"

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")
containers1 = page_soup.findAll("div", {"class": "marketboard-event-
group__item--sub-group"})
print(len(containers1))
containers2 = page_soup.findAll("table", {"class": "marketboard-event-with-
header__markets-list"})
print(len(containers2))

From the lenght of the containers I can already see, that either they contain more items then I anticipated or they are empty for unknown reasons... Hope u can guide me. Thanks in advance!

Does it show all of the tables like you expect when you print out `page_soup.prettify()`? Also, have you considered using requests instead of urllib.request? — Generic Guy, Aug 25 '17 at 19:06

score 6 · Accepted Answer · answered Aug 25 '17 at 21:13

You can use selenium together with ChromeDriver to scrape a page that generates JavaScript content, since this is the case here.

from selenium import webdriver
from bs4 import BeautifulSoup

url = "https://sports.bwin.com/de/sports/4/wetten/fußball#categoryIds=192&eventId=&leagueIds=43&marketGroupId=&page=0&sportId=4&templateIds=0.8649061927316986"
driver = webdriver.Chrome()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

containers = soup.findAll("table", {"class": "marketboard-event-with-header__markets-list"})

Now containers really has what we want, the tables elements, inspecting more, it's easy to see that our desired texts are in alternating <div> tags, so we can use zip and iter to create a list of tuples of result and odds together, alternating the divs list elements:

resultAndOdds = []    
for container in containers:
    divs = container.findAll('div')
    texts = [div.text for div in divs]
    it = iter(texts)
    resultAndOdds.append(list(zip(it, it)))

Demo:

>>> resultAndOdds[0]
[('1:0', '9.25'), ('0:0', '7.25'), ('0:1', '7.50'), ('2:0', '16.00'), ('1:1', '6.25'), ('0:2', '10.00'), ('2:1', '11.50'), ('2:2', '15.00'), ('1:2', '9.25'), ('3:0', '36.00'), ('3:3', '51.00'), ('0:3', '19.50'), ('3:1', '26.00'), ('4:4', '251.00'), ('1:3', '17.00'), ('3:2', '36.00'), ('2:3', '29.00'), ('4:0', '126.00'), ('0:4', '51.00'), ('4:1', '101.00'), ('1:4', '41.00'), ('4:2', '151.00'), ('2:4', '81.00'), ('4:3', '251.00'), ('3:4', '251.00'), ('Jedes andere Ergebnis', '29.00')]
>>> resultAndOdds[1]
[('1:0', '5.00'), ('0:0', '2.65'), ('0:1', '4.10'), ('2:0', '15.50'), ('1:1', '7.25'), ('0:2', '10.50'), ('2:1', '21.00'), ('2:2', '67.00'), ('1:2', '18.00'), ('3:0', '81.00'), ('3:3', '251.00'), ('0:3', '36.00'), ('3:1', '126.00'), ('4:4', '251.00'), ('1:3', '81.00'), ('3:2', '251.00'), ('2:3', '251.00'), ('4:0', '251.00'), ('0:4', '201.00'), ('4:1', '251.00'), ('1:4', '251.00'), ('4:2', '251.00'), ('2:4', '251.00'), ('4:3', '251.00'), ('3:4', '251.00'), ('Jedes andere Ergebnis', '251.00')]
>>> len(resultAndOdds)
24

Depending on how you want your data to be like, you can also get the titles of each table with something like:

titlesElements = soup.findAll("div", {"class":"marketboard-event-with-header__market-name"})
titlesTexts = [title.text for title in titlesElements]

It's undoubtedly one of the better ideas that rarely comes along. Btw, where should the driver.quit command be placed? Thanks. — SIM, Aug 25 '17 at 21:56
You can use `driver.quit()` or [**similars**](https://stackoverflow.com/questions/15067107/difference-between-webdriver-dispose-close-and-quit). And that would be as soon as `soup` is created. — Vinícius Figueiredo, Aug 25 '17 at 22:04

web scraping (football odds)

1 Answers1