I know this is an old question, but I stumbled upon this via Google search and found that no one has proposed a solution with only built-in features.
So I quickly wrote my own.
Basically a url string can only contain these characters: A-Z, a-z, 0-9, -, ., _, ~, :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, %, and =, everything else are url encoded.
URL encoding is pretty straight forward, just a percent sign followed by the hexadecimal digits of the byte values corresponding to the codepoints of illegal characters.
So basically using a simple while loop to iterate the characters, add any character's byte as is if it is not a percent sign, increment index by one, else add the byte following the percent sign and increment index by three, accumulate the bytes and decoding them should work perfectly.
Here is the code:
def url_parse(url):
l = len(url)
data = bytearray()
i = 0
while i < l:
if url[i] != '%':
d = ord(url[i])
i += 1
else:
d = int(url[i+1:i+3], 16)
i += 3
data.append(d)
return data.decode('utf8')
I have tested it and it works perfectly.