0

When I was using urlread to scrape the website: http://www.trackdota.com, the returen is garbled.

Code:

url = 'http://www.trackdota.com'
urlread(url)

the return is something like:

      �Xms�6��_�:�|#��l��+�qҤә�M�n�ǣ�H��
 Z�$��}�/"eIuo���� ��.<ޏt薹`s���ޘ~�t"���.}��

even if I reset the charset:

url = 'http://www.trackdota.com'
urlread(url,'charset','UTF-8')

My OS is Windows 8 and the version of Matlab is 2013b.

Leohc92
  • 78
  • 6
  • Consulting the documentation of `urlread`: *If the server returns binary data, the string will contain garbage.*. http://www.mathworks.com/help/matlab/ref/urlread.html... so what you're experiencing is probably due to this reason. – rayryeng Jan 08 '16 at 07:25
  • @rayryeng So it's impossible to read the text on this website? – Leohc92 Jan 08 '16 at 07:37
  • I don't know. I'm only quoting what the documentation says. It may be giving off binary data. I can't tell you for sure. – rayryeng Jan 08 '16 at 07:39
  • @rayryeng OK, any way, thank you. – Leohc92 Jan 08 '16 at 07:44
  • 1
    The documentation suggests to use [webread](http://se.mathworks.com/help/matlab/ref/webread.html) instead, have you tried that? – Jørgen Jan 08 '16 at 08:48
  • @Jørgen I know, but webread is only in Matlab 2015b. – Leohc92 Jan 08 '16 at 12:46
  • Sorry, didn't notice the version you used. Have you tried this http://stackoverflow.com/questions/33682063/matlab-urlread-wont-work-for-specific-webpage/33682983 , i.e. specifying the user-agent? I'm at home without a Matlab installation, so I cannot test this. – Jørgen Jan 09 '16 at 15:27

0 Answers0