Unable to retrieve links using beautiful soup and python

Question

I'm trying to extract all the url under the url : https://www.scotts.com/en-us/library/lawn-food

I have realized is that it does not returns few urls such as https://www.scotts.com/en-us/library/lawn-food/when-feed-greener-lawn and few more

I have mentioned below my code snippet:

import time
from random import randint
import requests
from bs4 import BeautifulSoup, SoupStrainer
import re

def scrape_google_summaries(url):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get(url)
    content = r.text

    soup = BeautifulSoup(content, "html.parser",parse_only=SoupStrainer('a', href=True))
    summary=[]
    for link in soup:#.find_all('a'):
        summary.append(link.get('href'))
        
    return summary

output = scrape_google_summaries("https://www.scotts.com/en-us/library/lawn-food")

Website loads data using javascript. I believe that's the reason why aren't getting the expected result. — imxitiz, Aug 04 '21 at 09:52
That site is being loaded by JavaScript. Use ```Selenium```. — Ram, Aug 04 '21 at 12:08

score 1 · Accepted Answer · edited Aug 04 '21 at 10:56

1

I checked by saving the r.text that is content to a local file and then i opened that in my browser and as expected all those article links that you are trying to scrape were not there..! Which means all those links are being dynamically generated.And beautifulSoup isn't considered for scraping dynamically generated website content.You will have to use some other tool like selenium or requests_html.

edited Aug 04 '21 at 10:56

DisappointedByUnaccountableMod

6,656
4
18
22

answered Aug 04 '21 at 10:34

Ajay Singh Rana

573
4
19

score 0 · Answer 2 · answered Aug 05 '21 at 09:42

0

I'd recommend using selenium and it's scroll down functionality.

More information here: https://stackoverflow.com/a/27760083/8623540

answered Aug 05 '21 at 09:42

Benjamin

45
7

Unable to retrieve links using beautiful soup and python

2 Answers2