0

Reference image

Imports

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests
from time import sleep

Open the page

driver = webdriver.Chrome()
main_url = 'https://www.samsung.com/ph/storelocator/'
driver.get(main_url)
driver.execute_script("window.scrollTo(0, 500)")
sleep(1)
driver.find_element_by_class_name('cm-cookie-geo__close-cta').click()

If I just GET the Request URL shown by the red arrow and replace the parameters with my desired parameters (change nradius=7), plain HTML is returned.

How can I get it to instead update the listing on the left panel like it would if I click the 10km button (except for 7km)?

I have tried using cookies as suggested here like this (without success):

# storing the cookies generated by the browser
request_cookies_browser = driver.get_cookies()

params = {
    'nRadius': 7,
    'latitude': 14.607538,
    'longitude': 121.020967,
    'searchFlag': 'search',
    'modelCode': '',
    'categorySubTypeCode': '',
    'localSearchCallYn': 'N'
}
s = requests.Session()

# passing the cookies generated from the browser to the session
c = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser]

resp = s.post(main_url, params) # I get a 200 status_code

# passing the cookie of the response to the browser
dict_resp_cookies = resp.cookies.get_dict()
response_cookies_browser = [{'name':name, 'value':value} for name, value in dict_resp_cookies.items()]
c = [driver.add_cookie(c) for c in response_cookies_browser]

driver.get(main_url)

Edit 1: I am trying to get the latitude and longitude which aren't available through that GET url. It can be found on the main page with

soup = BeautifulSoup(driver.page_source, 'lxml')
latitude = soup.find('ul', {'id':'store-list'}).find_all('li').find('input', {'class':'lat','type':'hidden'})['value']
  • What exactly are you trying to accomplish? If I GET that url with the changed Radius, it returns a list of stores with a radius around that longitude and latitude. You could scrape store addresses/phone numbers etc. from the html. What do you want from this page? – jarcobi889 Sep 06 '19 at 17:56
  • @jarcobi889 Yes that is a fair point but I am trying to get the latitude and longitude which aren't available through that GET url. It can be found on the main page though. I have edited my post to reflect this. – brandoldperson Sep 06 '19 at 18:03
  • If it's the long/lat you're after, and you just want to make direct requests rather than just use selenium, you'll need to dig into the javascript on the page and find the event that's being fired when you click the 10km button. That will show you what other requests are being sent off and what else is being updated on the page when the ajax request to that url you're seeing returns. – jarcobi889 Sep 06 '19 at 18:12

2 Answers2

1

Looking at the page, it seems like you may be better off scraping the html for elements where the distance attribute is less than or equal to 7. This is because it seems like the website only has specific parameters for nradius when returning a search of stores on the map (ie. only allows 1, 2, 5, and 10 km).

The way that it works is it finds your location and finds all locations less than 10 km (regardless of what distance you have selected). It then displays the locations on the map based off of the distance you have selected (nradius' provided). All of the stores <10km away are still listed in the html though.

However I've never done exactly what you're doing, so it could be something else. If you think that it is passing the cookies/headers between selenium and requests that is messing you up, you should check out the selenium-requests python package, which was developed to automatically handle needed cookie and request headers.

Good Luck!

  • Thank you! But I need the distances to be able to be greater than 10km too. I managed to solve it using the accepted answer. Turns out I was using the parameters wrongly. – brandoldperson Sep 10 '19 at 07:37
1

You can make a simple get request with requests and then parse with beautiful soup. The reason your code in the edit isn't working is because the html through the get request is formatted differently. The following worked for me.

Code

import requests
from bs4 import BeautifulSoup

params = {
    'nRadius': 7,
    'latitude': 14.601026,
    'longitude': 120.984192,
    'searchFlag': 'search',
    'modelCode': None,
    'categorySubTypeCode': None,
    'localSearchCallYn': 'N',
}
url = 'https://www.samsung.com/ph/storelocator/_jcr_content/par.cm-g-store-locator-storelist/'
r = requests.get(url, params=params)
soup = BeautifulSoup(r.text, 'html.parser')

for item_holder in soup.find_all('li'):
    name = item_holder.find('h2', {'class': 'store-name'}).text
    lat = item_holder.find('input', {'class': 'lat', 'type': 'hidden'})['value']
    long = item_holder.find('input', {'class': 'long', 'type': 'hidden'})['value']
    print('\n' + name)
    print(lat, long)

Output (partial)

WESTERN APPLIANCE - RECTO
14.604366 120.97991

ANSONS - BINONDO
14.6015268 120.97605479999993

SM APPLIANCE CENTER INC. - LUCKY CHINA TOWN
14.6031205 120.9741785

SM APPLIANCE CENTER INC. - MANILA
14.5904064 120.9830574
LuckyZakary
  • 1,171
  • 1
  • 7
  • 19