1

I would like to parse addresses from the following website: https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/

So far I am able to go to the website and click away any pop-ups. But then I need to select the drop-down menu with "1163 STANDORTE" which I am not able to locate with my code. My code so far:

import pandas as pd
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import time
import itertools
import os
import numpy as np
import csv
import pdb

os.chdir("Directory")
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('Directory/chromedriver.exe')
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
time.sleep(1)
try:
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
    pass
time.sleep(4)

Then my attempts using the span and button element and several options of navigation:

#Version 1
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='sc-hKFxyN jdMjfs']"))).click() 

#Version 2
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].scrollIntoView();", element)
driver.execute_script("arguments[0].click();", element)

# Version 3    
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].click();", element)

#Version 4
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='sc-eCApnc kiXUNl sc-jSFjdj lcZmPE']"))).click() 

# Version 5
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div/main/nav/header/button[1]"))).click() 

# Version 6
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='1163 STANDORTE']"))).click() 

Actually, there are three problems:

  1. If I just open the link on my Chrome manually, "1163 STANDORTE" appears, whereas if I open the link on Chrome using python, fewer STANDORTE appear, but I cannot zoom out. So I crucially need ALL 1163 STANDORTE to appear.
  2. I cannot locate the button using class and XPATH.
  3. Behind the button is a probably linked XML file, and the information of the addresses only appears after having clicked on the button. In the end I want to scrape text, written on the XML file linked to that button.

Any suggestions?

My question is similar to these previous questions: How to parse several attributes of website with same class name in python? and to Selenium-Debugging: Element is not clickable at point (X,Y)

tiny
  • 129
  • 6

2 Answers2

1

The data you are looking for is based of fetch / xhr call.

You can get it without scraping. See below.

import requests

headers = {'Origin': 'https://filialen.migros.ch',
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}

r = requests.get(
    'https://web-api.migros.ch/widgets/stores?key=loh7Diephiengaiv&aggregation_options[empty_buckets]=true&filters[markets][0][0]=super&filters[markets][0][1]=mno&filters[markets][0][2]=voi&filters[markets][0][3]=mp&filters[markets][0][4]=out&filters[markets][0][5]=spx&filters[markets][0][6]=doi&filters[markets][0][7]=mec&filters[markets][0][8]=mica&filters[markets][0][9]=res&filters[markets][0][10]=flori&filters[markets][0][11]=gour&filters[markets][0][12]=alna&filters[markets][0][13]=cof&filters[markets][0][14]=chng&verbosity=store&offset=0&limit=5000',
    headers=headers)
if r.status_code == 200:
    print('stores data below:')
    data = r.json()
    print(data)
else:
    print(f'Oops. Statud code is {r.status_code}')
balderman
  • 22,927
  • 7
  • 34
  • 52
  • Thanks @balderman, this works. But how did you know the api document? It would help for future similar problems. – tiny Sep 20 '21 at 07:37
  • In the browser do: F12 -- Network -- XHR and see the http calls that the page does in order to get the data. Feel free to accept the answer. – balderman Sep 20 '21 at 08:54
  • I am very sorry, but I will accept the other answer, as it is more focused on answering the actual question I had, even though your answer brings me to the final goal much quicker. I upvoted yours too though! – tiny Sep 20 '21 at 12:03
1

Few points :

  1. Launch browser in full screen mode.

  2. Use explicit waits.

  3. Use this xpath //span[contains(@aria-label, 'Standorte anzeigen')]/..

Sample code :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)

driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")

try:
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='close-icon']"))).click() # if there is smth to click away
except:
    pass

wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(@aria-label, 'Standorte anzeigen')]/.."))).click()

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

PS : Please check in the dev tools (Google chrome) if we have unique entry in HTML DOM or not.

Steps to check:

Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
  • Thanks @cruisepandey this works. But could you elaborate on how you knew this: "//span[contains(@aria-label, 'Standorte anzeigen')]/.." ? It would help me (and others) a lot to understand the underlying mechanics. – tiny Sep 20 '21 at 07:36
  • @tiny : open the dev tools in chrome, and basically it's a `xpath`, after seeing the HTML you can construct your own customized xpath. Refer here https://www.w3schools.com/xml/xpath_axes.asp to learn more about xpath. – cruisepandey Sep 20 '21 at 07:39
  • @ cruisepaney: I tried this (with CTRL-Shift-I), but there I can only find the button as `class ="sc-eCApnc kiXUNl sc-jSFjdj lcZmPE"`, towhich I tried to navigate using `xpath` but it was not possible. I can see your `xpath`, e.g. "Standorte anzeigen" nowhere in the html, so probably I'm looking at a wrong place. Where exactly did you find it? – tiny Sep 20 '21 at 08:42
  • See above I have updated how to check the elements. – cruisepandey Sep 20 '21 at 08:43
  • @ cruisepandey: Okay, thanks a lot for updating! Still I don't get why with the above specified class I can't navigat to the button using `xpath`. – tiny Sep 20 '21 at 09:15
  • Cause the class name do not work with spaces, instead of space put . and I'm sure it would work. – cruisepandey Sep 20 '21 at 09:17
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237276/discussion-between-tiny-and-cruisepandey). – tiny Sep 20 '21 at 09:31