how can I get html using python?

Question

i want to get html document on specific web site.

this code is working well.

import urllib2

link = "https://www.google.com"
print link
f = urllib2.urlopen(link)
myfile = f.read()
print myfile

but this code is not working.

import urllib2

link = "https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/"
print link
f = urllib2.urlopen(link)
myfile = f.read()
print myfile

why do not working specific site?

This is not a Python problem but an interesting behaviour of VirusTotal. Even using `curl -v https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/` returns `Content-Length: 0`. — , Jul 13 '15 at 08:14

score 1 · Accepted Answer · answered Jul 13 '15 at 08:30

it is weird and i don't know why urllib2 is not working.

although i tried this code working with selenium and it's worked for me.

from selenium import webdriver
url = 'https://www.virustotal.com/en/file/7cf757e0943b0a6598795156c156cb90feb7d87d4a22c01044499c4e1619ac57/analysis/'
mydriver = webdriver.PhantomJS()
mydriver.get(url)
page = mydriver.page_source
print page.encode('utf-8')

if you don't know phantomjs, it's just a headless browser. you can change phantomjs with FireFox, it's still working

It works well. And I find another way using virustotal api. Thank you. — somputer, Jul 13 '15 at 10:56

how can I get html using python?

1 Answers1