0

I am able to do the following in the python shell:

>>> import urllib
>>> s='https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql'
>>> print urllib.unquote(s)
https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql

However, if I do this within a python program, it improperly decodes the url:

url = res.history[0].url if res.history else res.url
print '1111', url
print '2222', urllib.unquote(url)

111 https://www.microsoft.com/de-at/store/movies/american-pie-pr%C3%A4sentiert-nackte-tatsachen/8d6kgwzl63ql
222 https://www.microsoft.com/de-at/store/movies/american-pie-präsentiert-nackte-tatsachen/8d6kgwzl63ql

Why isn't this being properly decoded in the program but it is in my python shell?

David542
  • 104,438
  • 178
  • 489
  • 842
  • Try to add a the line `# -*- coding: utf-8 -*-` at the top of the file to see if it helps. – Hai Vu Dec 27 '15 at 05:27
  • 2
    [Why did you post this question two times?](http://stackoverflow.com/questions/34477648/urldecoding-requests) I can't see any different. – Remi Guan Dec 27 '15 at 06:14

1 Answers1

1

The following worked to fix the issue:

url = urllib.unquote(str(res.url)).decode('utf-8', 'ignore')

res.url was a unicode string, but didn't seem to work well with urllib.unquote. So the solution was to first convert it to a string (like how it was in the python interpreter) and then decode it into Unicode.

David542
  • 104,438
  • 178
  • 489
  • 842