Scraping startpage with bs4 and requests

Question

I'm trying to scrape the search results off of http://startpage.com/, I have scraped the results all ready using bs4 and requests. I ran into a problem after being able to scrape the results. I can not get to the next page of the search results. I can not find a link using web browsing developer tools. When I do inspect the element this is what it shows <a href="javascript:;" class="numbers_st" onclick="mysubmit(10); return false;" id="2">2</a> thats the number 2 button. The other option is the next button <a href="javascript:document.nextform.submit();" class="numbers_st" style="width:200px; text-align:left;">Next<span class="i_next"></span></a> How do I make a request or what ever it is I need to do to get to the next page after scraping the results of the first page.

import requests
from bs4 import BeautifulSoup
def dork():
    url = 'https://www.startpage.com/do/search?cmd=process_search&query=inurl:admin&language=english_au&cat=web&with_language=&with_region=&pl=&ff=&rl=&abp=-1&with_date=m'
    source_code = requests.get(url, 'html')
    plain_txt = source_code.text
    soup = BeautifulSoup(plain_txt, "lxml")
    for text in soup.find_all('h3', {'class': 'clk'}):
        for link in text.find_all('a'):
            href = link.get('href')
            print(href)
dork()

Thats the code that gets the links.

score 0 · Answer 1 · answered Jul 10 '17 at 08:32

0

I will recommend you to try the Selenium/PhantomJS, which give you the ability to have a real, headless and scriptable browser. Checkout this answer

answered Jul 10 '17 at 08:32

Ketan Mukadam

789
3
7

Scraping startpage with bs4 and requests

1 Answers1