I've been reluctant to post a question about this, but after 3 days of google I can't get this to work. Long story short i'm making a raid gear tracker for WoW.
I'm using BS4 to handle the webscraping, I'm able to pull the page and scrape the info I need from it. The problem I'm having is when there is an extended ascii character in the player's name, ex: thermíte. (the i is alt+161)
http://us.battle.net/wow/en/character/garrosh/thermíte/advanced
I'm trying to figure out how to re-encode the url so it is more like this:
http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced
I'm using tkinter for the gui, I have the user select their realm from a dropdown and then type in the character name in an entry field.
namefield = Entry(window, textvariable=toonname)
I have a scraping function that performs the initial scrape of the main profile page. this is where I assign the value of namefield to a global variable.(I tried to passing it directly to the scraper from with this
namefield = Entry(window, textvariable=toonname, command=firstscrape)
I thought I was close, because when it passed "thermíte", the scrape function would print out "therm\xC3\xADte" all I needed to do was replace the '\x' with '%' and i'd be golden. But it wouldn't work. I could use mastername.find('\x') and it would find instances of it in the string, but doing mastername.replace('\x','%') wouldn't actually replace anything.
I tried various combinations of r'\x' '\%' r'\x' etc etc. no dice.
Lastly when I try to do things like encode into latin then decode back into utf-8 i get errors about how it can't handle the extended ascii character.
urlpart1 = "http://us.battle.net/wow/en/character/garrosh/"
urlpart2 = mastername
urlpart3 = "/advanced"
url = urlpart1 + urlpart2 + urlpart3
That's what I've been using to try and rebuild the final url(atm i'm leaving the realm constant until I can get the name problem fixed)
Tldr:
I'm trying to take a url with extended ascii like:
http://us.battle.net/wow/en/character/garrosh/thermíte/advanced
And have it become a url that a browser can easily process like:
http://us.battle.net/wow/en/character/garrosh/therm%C3%ADte/advanced
with all of the normal extended ascii characters.
I hope this made sense.
here is a pastebin for the full script atm. there are some things in it atm that aren't utilized until later on. pastebin link