0

I try to parse some href in a webpage using python. To do so, I use the following code which performs quite well, but the href returned does not deal with accents properly. I tried different methods but they don't work.

Here is my code :

links = browser.find_elements_by_xpath(path)
for link in links:
    code = link.get_attribute("href")
    print (code)
    f.write(code + "\n")

For instance I've got this : "http//ww.blabla//Cl%C3%A9ment"
Instead of this : "http//ww.blabla//Clément"

DapperDuck
  • 2,728
  • 1
  • 9
  • 21
Cmisner
  • 1
  • 2

1 Answers1

0

Thanks Mohsan Ali,

I found an answer thanks to your link. Here is how it works :

links = browser.find_elements_by_xpath(path)
for link in links:
    code = link.get_attribute("href")
    code = urllib.parse.unquote(code)
    print (code)
    f.write(code + "\n")

I'm on Python 3 so using :

import urllib.parse
urllib.parse.unquote(url)

works fine !

Thanks very much for your quick help.

Cmisner
  • 1
  • 2