-1

Although the web-scraper below works, it also includes listed hyperlinks unrelated to the webpage tables. What I would like to have help with is limiting the class criteria to only relevant tennis match hyperlinks within the class table "table-main only12 js-nrbanner-t".

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text, "html.parser")

matchlist = set('https://www.betexplorer.com'+a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))

print(pd.DataFrame(matchlist))

Edit: Driftr95 has found the exact solution I was looking for, even when I didn't phrase the question correctly

NewGuy1
  • 35
  • 4
  • 1
    Does this answer your question? [How to find elements by class](https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class) – esqew Nov 08 '22 at 19:16
  • where in the page are you getting unrelated links from with your current selector? Can you share some of those links as example? – Driftr95 Nov 08 '22 at 21:38
  • @Driftr95 the problem with this program is that it includes tennis matches that appear on different days than the one specified on the webpage. – NewGuy1 Nov 09 '22 at 04:46
  • @NewGuy1 it seems that `requests.get` returns extra results from the days before and after that are probably dynamically filtered out by the time the actual page finishes loading on browser....please see the edit added to my answer – Driftr95 Nov 09 '22 at 06:07
  • @NewGuy1 btw, were you trying to filter by date all along? because then you should have made the question about that, or at least included it in the post - all three days' matches are in the same table in the html fetched by `requests.get`, so adding table to the selector doesn't narrow down the date at all. – Driftr95 Nov 09 '22 at 06:10

1 Answers1

1

You can just add the table to the selector in select

tLinkSel = 'table.table-main.only12.js-nrbanner-t a[href^="/tennis"]:has(strong)'
matchlist = set('https://www.betexplorer.com'+a.get('href') for a in soup.select(tLinkSel))

although, I have to mention that I did not see any difference in the results when searching in dev tools, but this will limit the links to only those in the table.


Additional EDIT:

You can target specific dates with the data-dt attribute of the rows [tr]; for example, for Nov 2, 2022, you can set

tLinkSel = 'tr[data-dt^="2,11,2022,"] a[href^="/tennis"]:has(strong)'
Driftr95
  • 4,572
  • 2
  • 9
  • 21