Python + Selenium encoding nightmare

Question

Using Python and Selenium I'm crawling my website. I need to select values from drop-down menus in order to go from one page to the other.

One value I need to select is a"Brésil" and as you can see it contains non ascii letters.

I've added at the top of my document.

# -*- coding: utf-8 -*-

But when I try to assess the value from the element containing this string I'm stuck:

el = driver.find_element_by_xpath("/html/body/div/div[2]/ul/li[2]/select")
    for option in el.find_elements_by_tag_name('option'):
        if option.text == "Brésil":
            option.click()
            break

Here is the error message I get:

UnicodeWarning: Unicode equal comparison failed to convert both arguments to 
Unicode - interpreting them as being unequal
if option.text == "Brésil":

I believe this is an encoding issue as my code works with any other strings which don't contain accentuated letters.

Any help would be greatly appreciated.

Thanks.

The message is fairly clear. The comparison is failing because `option.text` is not Unicode but `"Brésil"` is. I can't guess what encoding `option.text` is, but you could try `option.text.decode("latin-1")`. — BoarGules, Jan 07 '18 at 16:04
In your script everything is ok. Seems, the problem is related to your system settings. — Ratmir Asanov, Jan 07 '18 at 16:34
Just to clarify, variable el is a WeDriver from selenium (this is my understanding). — doingmybest, Jan 07 '18 at 22:11
@RatmirAsanov if I do if option.text == u"Brésil": I don't have the error message anymore. But I can't filter the real value from the html tag which is "Brésil" and not "Bresil" any pointers regarding my system settings would help (I've tried on two computers so far) — doingmybest, Jan 07 '18 at 22:13
Possible duplicate of [how to add non-ascii characters in Xpath, in Scrappy](https://stackoverflow.com/questions/40813809/how-to-add-non-ascii-characters-in-xpath-in-scrappy) — JeffC, Jan 08 '18 at 04:44
Not sure why selenium's `.text` seems to not encode things in expected utf-8, but if you're not looking for the strictest match you could use `unicodedata.normalize('NFC', option.text).casefold() == unicodedata.normalize('NFC', "Brésil").casefold()` from https://docs.python.org/3/howto/unicode.html#comparing-strings — Cynic, Dec 08 '20 at 00:11

Python + Selenium encoding nightmare

0 Answers0