3

I am very new to Python. My recent project is scraping data from a betting website. What I want to scrape is the odds information from the webpage.

Here is my code

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://bet.hkjc.com/default.aspx?url=football/odds/odds_allodds.aspx&lang=CH&tmatchid=120653'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

page_soup.findAll("div",{"class":"oddsAll"})

but the result return [] , which is none

What should I do to make my code work?

emporerblk
  • 1,063
  • 5
  • 20
Kenny
  • 226
  • 1
  • 3
  • 13
  • 1
    You can't scrap like this , you have to specify which table and which div you want to extract. i just checked and found you have just used a div name which contains many inner div and classes . and second try to `print(page_soup.prettify())` and see in the output is there any class name "oddsALL" ? – Aaditya Ura Nov 06 '17 at 16:14
  • 2
    `page_soup.findAll("div", class_="oddsAll")`; seems like there's no such class! whereas this returns -> `page_soup.findAll("div", class_="dialog")` – Van Peer Nov 06 '17 at 16:15
  • 2
    Do you know what odds you are looking for exactly? Because the query in you code (`page_soup.findAll("div",{"class":"oddsAll"})`) doesn't match any element on the webpage you provided. – emporerblk Nov 06 '17 at 16:19
  • i am sorry for not familiar with html code if i want to get the odd, for example, Pachuca (Home Team) win how can i deal with that? – Kenny Nov 06 '17 at 16:24

1 Answers1

2

Updated the URL to be the page loaded from this page, using JavaScript, which contains the data and updated the tmatchid to be current 120998. Updated div to be tabe and the correct class.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://bet.hkjc.com/football/odds/odds_allodds.aspx?lang=CH&tmatchid=120998'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
tables = page_soup.findAll("table",{"class":"tOdds"})
for table in tables:
    print (table.text)

Outputs:

燕豪芬青年隊(主隊勝) 和 烏德勒支青年隊(客隊勝)   1.53 4.00 4.60 
  燕豪芬青年隊(主隊勝) 和 烏德勒支青年隊(客隊勝)   1.97 2.45 4.70 
  燕豪芬青年隊[-1](主隊勝) 和 烏德勒支青年隊[+1](客隊勝)   2.45 3.60 2.26 
  球數 大 細  [3/3.5]2.021.70
  球數 大 細   [1.5]2.191.60
     1.44    18.00    2.65   
  0 1 2 3 4 5 6 7+   18.00 6.60 4.10 3.65 4.50 6.70 11.00 14.00 
  單 雙   1.90 1.80 
  主 主 主 和 和 和 客 客 客   主 和 客 主 和 客 主 和 客   2.30 14.00 34.00 4.70 6.50 10.50 19.00 14.00 7.50 

UPDATED in response to comment:

In this case you need the URL of the frame showing the data. You can do something like this:

import requests
from bs4 import BeautifulSoup
url = requests.get('http://football.hkjc.com/football/iframe/statistics/head-to-head/summary-iframe.aspx?ci=en-US')
soup = BeautifulSoup(url.content, 'lxml')
divs = soup.findAll('div', {'class':['win', 'draw', 'lose']})
for div in divs:
    print (div.get_text())

Outputs:

18/03/2018 Italian Division 1 : Benevento 1-2 Cagliari
18/02/2018 Italian Division 1 : Benevento 3-2 Crotone
05/02/2018 Italian Division 1 : Benevento 0-2 Napoli
06/01/2018 Italian Division 1 : Benevento 3-2 Sampdoria
30/12/2017 Italian Division 1 : Benevento 1-0 Chievo
18/12/2017 Italian Division 1 : Benevento 1-2 SPAL
03/12/2017 Italian Division 1 : Benevento 2-2 AC Milan
19/11/2017 Italian Division 1 : Benevento 1-2 Sassuolo
29/10/2017 Italian Division 1 : Benevento 1-5 Lazio
22/10/2017 Italian Division 1 : Benevento 0-3 Fiorentina
31/03/2018 Italian Division 1 : Inter Milan 3-0 Verona
20/02/2018 Italian Division 1 : Lazio 2-0 Verona
11/02/2018 Italian Division 1 : Sampdoria 2-0 Verona
28/01/2018 Italian Division 1 : Fiorentina 1-4 Verona
06/01/2018 Italian Division 1 : Napoli 2-0 Verona
23/12/2017 Italian Division 1 : Udinese 4-0 Verona
14/12/2017 Italian Cup : AC Milan 3-0 Verona
10/12/2017 Italian Division 1 : SPAL 2-2 Verona
30/11/2017 Italian Cup : Chievo 1-1 Verona
26/11/2017 Italian Division 1 : Sassuolo 0-2 Verona
Dan-Dev
  • 8,957
  • 3
  • 38
  • 55
  • Hello Dan, I know you are python expert. May I ask you one more question? What do you mean by "Updated the URL using JavaScript" ? Can you do the same things in the following URL? http://football.hkjc.com/football/statistics/english/head-to-head/summary.aspx?ci=en-us – Kenny Apr 04 '18 at 14:17
  • Updated answer in response to question. – Dan-Dev Apr 04 '18 at 15:39
  • Hello Dan, is me. Sorry for trouble you again. Can you help answering my other question also related to scraping?https://stackoverflow.com/questions/50964115/beautifulsoup-webpage-have-protection-and-prettify-return-no-data/50964309#50964309 – Kenny Jun 22 '18 at 03:23
  • You can try setting a user agent header see https://stackoverflow.com/questions/50882732/accessing-hidden-tabs-web-scraping-with-python-3-6/50888787#50888787 but your IP is probably blocked.So you could use a proxy but they may block common proxies. So you could use STEM and TOR but TOR is a cesspit see https://stackoverflow.com/questions/30286293/make-requests-using-python-over-tor/33875657 If they are providing a paid for API then they have probably made it very difficult or impossible to scrape. My advice would be to use the paid for API we all have to make a living. – Dan-Dev Jun 22 '18 at 19:27