0

I need to parse Pinterest, but for some reason, instead of links to pictures, incomprehensible and non-working links appear.

def parse():
    url = 'https://www.pinterest.ie/'
    r = requests.get(url)
    soup = BeautifulSoup(r.text,'lxml')
    print(soup.find_all('a'))
parse()
  • 2
    Have you LOOKED at the source code for that page, using View Source or by printing out `r.text`? The HTML you fetch contains little more than ads. The page is built dynamically with Javascript. You'd need to use something like Selenium to get a real browser involved. – Tim Roberts Sep 03 '22 at 06:19

1 Answers1

0

The site requires JavaScript to be active, which isn't the case when you send a request through BeautifulSoup. A workaround has been suggested here, where you can use Selenium to open up the page in an actual browser (thereby enabling JavaScript), and then use BeautifulSoup to parse the HTML.

Something like this should work:

from bs4 import BeautifulSoup
import selenium.webdriver.chrome.service as service
from selenium import webdriver

service = service.Service("../chromedriver.exe")
service.start()
driver = webdriver.Remote(service.service_url)

def parse():
    url = 'https://www.pinterest.ie/'
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, 'lxml')
    print(soup.find_all('a'))

parse()

You will, of course, need some idea of how to use Selenium. The official docs should help.

M B
  • 2,700
  • 2
  • 15
  • 20