0

I have a list containing three URLs, all of them being of type String.

my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']

As you can see there are some non UTF-8 encodings in them. I would like them to be written in their proper fashion in Spanish and then that they lead me to the appropriate webpage if i click on them.

Here is the code I have tried.

import codecs
my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']
for item in my_list:
    item_bytes = str.encode(item)
    item_string = codecs.decode(item_bytes, 'utf-8')
    print(item_string)

However, my "item_string" keeps having the same encoding.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Javi
  • 41
  • 3

1 Answers1

0

That's called url encoding, so you need to do urldecode. To do so there is a unquote function in urllib.parse

Here is the example:

from urllib.parse import unquote

my_list = ['https://es.wikipedia.org//wiki/Enciclopedia_Brit%C3%A1nica', 'https://es.wikipedia.org//wiki/Instituto_Nacional_de_Estad%C3%ADstica_(Espa%C3%B1a)', 'https://es.wikipedia.org//wiki/Mar%C3%ADa_Isabel_Gea']

for item in my_list:
    print(unquote(item))
Alex Kosh
  • 2,206
  • 2
  • 19
  • 18