I am currently using python to scrape this site, with thousands of pages and it is doing fine, but it takes a couple of hours to go through all the pages in parts (because I have a short delay between each page which I believe is fair to the provider of the site.) However on the real site there is a dropdown menu with an option to display more results on the page. In the HTML that looks like this:
<div class="page-sizer">
<select id="itemsPerPage" class="form-control input-sm">
<option value="10" selected>10</option>
<option value="50" >50</option>
<option value="200" >200</option>
</select>
</div>
<script>
$(document).on('bb:ready', function () {
var pageSizeOptions = {
setPageSizeUrl: '/Pager/SetPageSize'
};
ScrapeThisWebsite.PageSize.init(pageSizeOptions);
});
</script>
Is there any way for me to automatically display the 200 results per page instead of only 10 and save some time for the provider and me? The selection does not show in the link. So, if I copy the page-link to another browser, it returns to the default.
I'm going through the pages using the following simple steps:
myheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
page = requests.get(url,headers=myheaders)
Is it linked to how the page is loaded?