What you did should work fine.
I've ran it once, but then it stopped working.
The problem is that website has an anti scraping mechanism that blocks you if you do too many requests on their site.
What I would recommend you do is:
- add
userAgent()
in order to identify yourself as a bot scraper.
- read their Terms of Service to check if you are allowed to scrape their site.
- send them an email telling what are you intentions and if they are okay scraping parts of their site.
By the way, if you want to debug what is happening, how I did it is just change the Jsoup calls as:
String gotten_next_date =
Jsoup.connect("https://www.vividseats.com/nba-basketball/toronto-raptors-schedule.html").get().html();
This returns the html of the requested page, which if you look, does not have anything interesting.
<!doctype html>
<html>
<head>
<meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0">
<meta http-equiv="cache-control" content="no-cache">
<meta http-equiv="expires" content="0">
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT">
<meta http-equiv="pragma" content="no-cache">
<meta http-equiv="refresh" content="10; url=/distil_r_captcha.html?requestId=291c6193-eb12-4e96-b1cd-23ba9a75e659&httpReferrer=%2Fnba-basketball%2Ftoronto-raptors-schedule.html">
<script type="text/javascript">
(function(window){
try {
if (typeof sessionStorage !== 'undefined'){
sessionStorage.setItem('distil_referrer', document.referrer);
}
} catch (e){}
})(window);
</script>
<script type="text/javascript" src="/vvdstsdstl.js" defer></script>
<style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#twsyxyabbqdwrxzyzxesxywvwuzbszeeacwd{display:none!important}</style>
<script>var w=window;if(w.performance||w.mozPerformance||w.msPerformance||w.webkitPerformance){var d=document;AKSB=w.AKSB||{},AKSB.q=AKSB.q||[],AKSB.mark=AKSB.mark||function(e,_){AKSB.q.push(["mark",e,_||(new Date).getTime()])},AKSB.measure=AKSB.measure||function(e,_,t){AKSB.q.push(["measure",e,_,t||(new Date).getTime()])},AKSB.done=AKSB.done||function(e){AKSB.q.push(["done",e])},AKSB.mark("firstbyte",(new Date).getTime()),AKSB.prof={custid:"632139",ustr:"",originlat:"0",clientrtt:"124",ghostip:"72.247.179.76",ipv6:false,pct:"10",clientip:"79.119.120.57",requestid:"418cf776",region:"26128",protocol:"",blver:14,akM:"b",akN:"ae",akTT:"O",akTX:"1",akTI:"418cf776",ai:"275708",ra:"false",pmgn:"",pmgi:"",pmp:"",qc:""},function(e){var _=d.createElement("script");_.async="async",_.src=e;var t=d.getElementsByTagName("script"),t=t[t.length-1];t.parentNode.insertBefore(_,t)}(("https:"===d.location.protocol?"https:":"http:")+"//ds-aksb-a.akamaihd.net/aksb.min.js")}</script>
</head>
<body>
<div id="distilIdentificationBlock">
</div>
</body>
Update: (from zack6849)
If you look closely inside the head
tag, the last meta
tag hints that you are being redirected to a captcha page:
<meta http-equiv="refresh" content="10; url=/distil_r_captcha.html?requestId=291c6193-eb12-4e96-b1cd-23ba9a75e659&httpReferrer=%2Fnba-basketball%2Ftoronto-raptors-schedule.html">
If you also search a bit for distilIdentificationBlock
which is found in the html, you can see that it's related to scrapers being blocked.
Hope it helps you get a better understanding of what is happening.