Extract title but give me wrong output using selenium

Question

import time
from selenium import webdriver

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager





options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#chrome to stay open to see what's happening in the real word or make it comment to close
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)   

URL ='https://advpalata.vrn.ru/registers/reestr_lawyers/'
driver.get(URL)

title=driver.find_element("xpath", '//ul[@class="letter-filter"]//li[1]')
title.click()

page_links = [element.get_attribute('href') for element in driver.find_elements(By.XPATH, "//td[@class='name']//a")]

for link in page_links:
    driver.get(link)
    time.sleep(2)
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3"))).text)

driver.close()

I want to extract the name but they extract the name in different format they will give me output like these page link is https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/

\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd1\x83\xd0\xbc\xd0\xbe\xd0\xb2

but I want output these:

Абдуллаев Парвиз Заирхан оглы

In your own words, where the code says `name=driver.find_element("xpath", '//h3').text.encode('utf-8')`, what do you think this means? Specifically, what effect do you expect the `.encode('utf-8')` part to have? Did you try leaving this out? What happens if you leave it out? (If you don't understand why this happens, please read https://nedbatchelder.com/text/unipain.html .) — Karl Knechtel, Jul 26 '22 at 21:19
The output you got just looks like a Python byte string https://stackoverflow.com/a/6224384/150978. Have you tried to decode it to a character string? — Robert, Jul 26 '22 at 21:20
@Robert rather than decoding, it would be better not to encode it in the first place, as the string was already present in the code. — Karl Knechtel, Jul 26 '22 at 21:20
Another thing you should be looking at is that `except: pass`, a bad programming practice: https://stackoverflow.com/questions/21553327/why-is-except-pass-a-bad-programming-practice — Barry the Platipus, Jul 26 '22 at 22:03

undetected Selenium · Answer 1 · 2022-07-27T10:51:47.290

0

The WebElements are dynamically loaded. So you need to wait for the elements/texts to completely load before you attempt to extract them. Moreover you don't need to explicitly encode to utf-8 as by default Python uses utf-8 encoding.

Solution

To print the name ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using TAG_NAME:

#_*_coding: utf-8_*_
# driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.TAG_NAME, "h3"))).text)

Using CSS_SELECTOR:

#_*_coding: utf-8_*_
# driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3"))).text)

Using XPATH:

#_*_coding: utf-8_*_
# driver.execute("get", {'url': 'https://advpalata.vrn.ru/registers/reestr_lawyers/abdullaev_parviz_zairhan_ogly/'})
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3"))).text)

Console Output:

Абдуллаев Парвиз Заирхан оглы

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

edited Jul 27 '22 at 10:51

answered Jul 26 '22 at 21:36

undetected Selenium

183,867
41
278
352

Irrelevant in this context, element will be located with or without Wait. The issue was his encoding. – Barry the Platipus Jul 26 '22 at 21:42
@platipus_on_fire Which part did you find irrelevant here? _WebDriverWait_? – undetected Selenium Jul 26 '22 at 21:47
That's correct, yes. His issue was not locating the element (although that XPATH is a little fragile, and he should select something more robust), his issue was the encoding. – Barry the Platipus Jul 26 '22 at 21:48
@platipus_on_fire Sorry to say, that's a good approach using _requests_ / _BS4_, not _Selenium_ atleast. – undetected Selenium Jul 26 '22 at 21:49
You're mising the point: your response *does not* address the issue at hand, on this instance. – Barry the Platipus Jul 26 '22 at 21:51
@platipus_on_fire _"his issue was the encoding"_: please [don't preach what you don't practice](https://stackoverflow.com/revisions/73129906/1) – undetected Selenium Jul 26 '22 at 21:54
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to – Amen Aziz Jul 27 '22 at 10:49
Add a line at the top of your file `#_*_coding: utf-8_*_` – undetected Selenium Jul 27 '22 at 10:52
I donot understand where I add the line – Amen Aziz Jul 27 '22 at 10:59
Did you crosscheck the updated code snippets within my answer? – undetected Selenium Jul 27 '22 at 11:00
yes I check it will give same error – Amen Aziz Jul 27 '22 at 11:03
# -*- coding: UTF-8 -* when I add the they say utf not defined – Amen Aziz Jul 27 '22 at 11:09
Are you sure you have used `.visibility_of_element_located()` – undetected Selenium Jul 27 '22 at 11:36
yes I have edit the code check it – Amen Aziz Jul 27 '22 at 11:50

Extract title but give me wrong output using selenium

1 Answers1

Solution