1

I've been trying to scrape data from a web page using python and so far so good. But the problem is that the page doesn't load everything right away, there is a " show more " button. So my script only scrape the first 10 items. I've looked at the site and there is nothing I can do with the url. I guess I have to post something to the server to get back the next items but I don't know what to post and how. There is my code :

res = requests.get('https://candidat.pole-emploi.fr/offres/recherche?motsCles=serveur&offresPartenaires=true&rayon=20&tri=0')

page_soup = bs4.BeautifulSoup(res.text,"html.parser")

containers = page_soup.findAll("div",{"class":"media-body"})
url = []
for container in containers:
    url.append('https://candidat.pole-emploi.fr' +container.h2.a["href"])



for i in url:
    print(i)
email_list = []

for adress in url:
    print( ' testing ', adress)
    found = False
    detail = requests.get(adress)
    apply = bs4.BeautifulSoup(detail.text,"html.parser")
    apply_mail = apply.findAll("div",{"class":"apply-block"})
    if apply_mail == []:
        email_list.append('not found')
        continue

    email_raw = apply_mail[0].text
    for i in email_raw.splitlines():
        if '@' in i:
            email_list.append(i)
            found = True
    if not found:
        email_list.append('not found')



for i in email_list:
    print(i)
Alan Kavanagh
  • 9,425
  • 7
  • 41
  • 65

1 Answers1

2

The only data the that you can scrap with Beuatifulsoup or other http request library is that are available on start without Javascirpt in action. It is same as doingcurl $URL, and parsing data.

One way to approch to this problem would be using selenium webdriver and program same actions as user would have performed on browser.

More information can be found

Sumit Jha
  • 1,601
  • 11
  • 18