How can I decode the URLs in UTF-8 and make them working when I open them on a browser

Question

I have a list containing three URLs, all of them being of type String.

my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']

As you can see there are some non UTF-8 encodings in them. I would like them to be written in their proper fashion in Spanish and then that they lead me to the appropriate webpage if i click on them.

Here is the code I have tried.

import codecs
my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']
for item in my_list:
    item_bytes = str.encode(item)
    item_string = codecs.decode(item_bytes, 'utf-8')
    print(item_string)

However, my "item_string" keeps having the same encoding.

score 0 · Answer 1 · answered Feb 19 '23 at 20:31

That's called url encoding, so you need to do urldecode. To do so there is a unquote function in urllib.parse

Here is the example:

from urllib.parse import unquote

my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']

for item in my_list:
    print(unquote(item))

How can I decode the URLs in UTF-8 and make them working when I open them on a browser

1 Answers1