-4
import time
from selenium import webdriver

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager





options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#chrome to stay open to see what's happening in the real word or make it comment to close
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)   

URL ='https://advpalata.vrn.ru/registers/reestr_lawyers/'
driver.get(URL)

title=driver.find_element("xpath", '//ul[@class="letter-filter"]//li[1]')
title.click()

page_links = [element.get_attribute('href') for element in driver.find_elements(By.XPATH, "//td[@class='name']//a")]

for link in page_links:
    driver.get(link)
    time.sleep(2)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3"))).text)

driver.close()

I want to extract the name but they extract the name in different format they will give me output like these page link is https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/

\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd1\x83\xd0\xbc\xd0\xbe\xd0\xb2

but I want output these:

Абдуллаев Парвиз Заирхан оглы
Amen Aziz
  • 769
  • 2
  • 13
  • In your own words, where the code says `name=driver.find_element("xpath", '//h3').text.encode('utf-8')`, what do you think this means? Specifically, what effect do you expect the `.encode('utf-8')` part to have? Did you try leaving this out? What happens if you leave it out? (If you don't understand why this happens, please read https://nedbatchelder.com/text/unipain.html .) – Karl Knechtel Jul 26 '22 at 21:19
  • The output you got just looks like a Python byte string https://stackoverflow.com/a/6224384/150978. Have you tried to decode it to a character string? – Robert Jul 26 '22 at 21:20
  • @Robert rather than decoding, it would be better not to encode it in the first place, as the string was already present in the code. – Karl Knechtel Jul 26 '22 at 21:20
  • Another thing you should be looking at is that `except: pass`, a bad programming practice: https://stackoverflow.com/questions/21553327/why-is-except-pass-a-bad-programming-practice – Barry the Platipus Jul 26 '22 at 22:03

1 Answers1

0

The WebElements are dynamically loaded. So you need to wait for the elements/texts to completely load before you attempt to extract them. Moreover you don't need to explicitly encode to utf-8 as by default Python uses utf-8 encoding.


Solution

To print the name ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using TAG_NAME:

    #_*_coding: utf-8_*_
    # driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.TAG_NAME, "h3"))).text)
    
  • Using CSS_SELECTOR:

    #_*_coding: utf-8_*_
    # driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3"))).text)
    
  • Using XPATH:

    #_*_coding: utf-8_*_
    # driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3"))).text)
    
  • Console Output:

    Абдуллаев Парвиз Заирхан оглы
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352