0

I have some parsing jobs via python and selenium. Here is my link in HTML source:

<а class="NQWMenuItem" name="SectionElements" href="javascript:void(null);" onclick="NQWClearActiveMenu();Download('saw.dll?Go&_scid=RQqdowdFKUY&ViewID=d\x253adashboard\x257ep\x253a6umggrpo8urqvbmv\x257er\x253a67dmsf5fpr8csc50&Action=Download&SearchID=hmd09g8fe17dagu1l8l463e856&PortalPath=/shared/\x25d0\x25a1\x25d0\x25b5\x25d1\x2580\x25d0\x25b2\x25d0\x25b8\x25d1\x2581/_portal/\x25d0\x25a1\x25d0\x25b5\x25d1\x2580\x25d0\x25b2\x25d0\x25b8\x25d1\x2581\x2520-\x2520\x25d0\x2597\x25d0\x259e\x2520\x25d0\x25b8\x2520\x25d0\x2597\x25d0\x25bd\x25d0\x25a0&Page=\x25d0\x2597\x25d0\x259e\x2520\x25d0\x25b7\x25d0\x25b0\x25d0\x25b4\x25d0\x25b0\x25d1\x2587\x25d0\x25b0\x2520\x25d0\x2597\x25d0\x25bd\x25d0\x25a0&ViewState=4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName=\x25d0\x25bf\x25d1\x2580\x25d0\x25b5\x25d0\x25b4\x25d1\x2581\x25d1\x2582\x25d0\x25b0\x25d0\x25b2\x25d0\x25bb\x25d0\x25b5\x25d0\x25bd\x25d0\x25b8\x25d0\x25b5\x253a\x2520\x25d0\x2597\x25d0\x259e\x2520\x25d0\x25b7\x25d0\x25b0\x25d0\x25b4\x25d0\x25b0\x25d1\x2587\x25d0\x25b0\x2520\x25d0\x2597\x25d0\x25bd\x25d0\x25a0&Format=excel2000&Extension=.xls'); return false" style="">Загрузить из сети в Excel 2000

I get onclick str (here is the URL of the document i need), but there is russian characters encoded: \x25b0, \x25d0, \x25b5 etc.

When i click this link in my browser that url will be:

http://ld3ap03.htsk.ru:7777/analytics/saw.dll?Go&_scid=RQqdowdFKUY&ViewID=d:dashboard~p:6umggrpo8urqvbmv~r:67dmsf5fpr8csc50&Action=Download&SearchID=hmd09g8fe17dagu1l8l463e856&PortalPath=/shared/Сервис/_portal/Сервис - ЗО и ЗнР&Page=ЗО задача ЗнР&ViewState=4e0eaq3qdoiuvg7v7e2ke0u78i&ItemName=представление: ЗО задача ЗнР&Format=excel2000&Extension=.xls

As you can see there is no \x-encoded charecters.

What is \x encoding? How can i get right URL? I use Python.

Ray
  • 3,864
  • 7
  • 24
  • 36

1 Answers1

0

It seems HTML entry. The solution is:

urllib.parse.unquote(html.unescape(my_url))

Described here: Decode HTML entities in Python string?