1

Using Python and Selenium I'm crawling my website. I need to select values from drop-down menus in order to go from one page to the other.

One value I need to select is a"Brésil" and as you can see it contains non ascii letters.

I've added at the top of my document.

# -*- coding: utf-8 -*-

But when I try to assess the value from the element containing this string I'm stuck:

el = driver.find_element_by_xpath("/html/body/div/div[2]/ul/li[2]/select")
    for option in el.find_elements_by_tag_name('option'):
        if option.text == "Brésil":
            option.click()
            break

Here is the error message I get:

UnicodeWarning: Unicode equal comparison failed to convert both arguments to 
Unicode - interpreting them as being unequal
if option.text == "Brésil":

I believe this is an encoding issue as my code works with any other strings which don't contain accentuated letters.

Any help would be greatly appreciated.

Thanks.

Ratmir Asanov
  • 6,237
  • 5
  • 26
  • 40
doingmybest
  • 314
  • 4
  • 16
  • The message is fairly clear. The comparison is failing because `option.text` is not Unicode but `"Brésil"` is. I can't guess what encoding `option.text` is, but you could try `option.text.decode("latin-1")`. – BoarGules Jan 07 '18 at 16:04
  • 1
    In your script everything is ok. Seems, the problem is related to your system settings. – Ratmir Asanov Jan 07 '18 at 16:34
  • Just to clarify, variable el is a WeDriver from selenium (this is my understanding). – doingmybest Jan 07 '18 at 22:11
  • @RatmirAsanov if I do if option.text == u"Brésil": I don't have the error message anymore. But I can't filter the real value from the html tag which is "Brésil" and not "Bresil" any pointers regarding my system settings would help (I've tried on two computers so far) – doingmybest Jan 07 '18 at 22:13
  • Possible duplicate of [how to add non-ascii characters in Xpath, in Scrappy](https://stackoverflow.com/questions/40813809/how-to-add-non-ascii-characters-in-xpath-in-scrappy) – JeffC Jan 08 '18 at 04:44
  • Not sure why selenium's `.text` seems to not encode things in expected utf-8, but if you're not looking for the strictest match you could use `unicodedata.normalize('NFC', option.text).casefold() == unicodedata.normalize('NFC', "Brésil").casefold()` from https://docs.python.org/3/howto/unicode.html#comparing-strings – Cynic Dec 08 '20 at 00:11

0 Answers0