2

i want to get html document on specific web site.

this code is working well.

import urllib2

link = "https://www.google.com"
print link
f = urllib2.urlopen(link)
myfile = f.read()
print myfile

but this code is not working.

import urllib2

link = "https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/"
print link
f = urllib2.urlopen(link)
myfile = f.read()
print myfile

why do not working specific site?

somputer
  • 235
  • 2
  • 8
  • 2
    Do you get an error or just an empty file? – NDevox Jul 13 '15 at 07:50
  • How do you know it's not working? – Peter Wood Jul 13 '15 at 07:51
  • This is not a Python problem but an interesting behaviour of VirusTotal. Even using `curl -v https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/` returns `Content-Length: 0`. –  Jul 13 '15 at 08:14

1 Answers1

1

it is weird and i don't know why urllib2 is not working.

although i tried this code working with selenium and it's worked for me.

from selenium import webdriver
url = 'https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/'
mydriver = webdriver.PhantomJS()
mydriver.get(url)
page = mydriver.page_source
print page.encode('utf-8')

if you don't know phantomjs, it's just a headless browser. you can change phantomjs with FireFox, it's still working

omri_saadon
  • 10,193
  • 7
  • 33
  • 58