1

I am trying to extract all href links that are within class ['address']. Each time I run the code, I only get the first 5 and that's it, even though I know there should be 9.

Web-Page: https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch

I have read through a variety of threads below, altered my code countless times, including switching through all parsers (html.parser, html5lib, lxml, xml, lxml-xml) but nothing seems to be working. Any idea of what's causing it stop after the 5th iteration? I am still fairly new into python so I apologize if this is a rookie mistake that I'm overlooking. Any help would be appreciated, even the sarcastic answers :)

I used pretty similar code on the following web-pages below and did not experience any issues scraping the hrefs: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK

My code below:

import requests
from bs4 import BeautifulSoup


local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

My results (first 5):

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680

But should be 9:

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
  6. /locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
  7. /locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
  8. /locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
  9. /locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681

2 Answers2

0

Try using selenium instead of requests to get the source code of the page. Here is how you do it:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

The rest of the code is the same. Here is the full code:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

Output:

/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
Sushil
  • 5,440
  • 1
  • 8
  • 26
0

the page uses Ajax to load store information from external URL. You can use requests/json module to load it:

import re
import json
import requests


url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([\d.-]+),"lng":([\d.-]+)', requests.get(url).text)

params = {
    'lat': m.group(1),
    'lng': m.group(2)
}

data = requests.post(ajax_url, json=params).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for result in data['results']:
    print(result['store']['address']['street'])
    print('https://www.walgreens.com' + result['storeSeoUrl'])
    print('-' * 80)

Prints:

1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thank you @Andrej Kesely. Definitely good to know going forward in terms of the Ajax. And your code worked for me and returned the results I needed. – brandtljames Oct 20 '20 at 15:35