-2

In python, by using an HTML parser, is it possible to get the document.lastModified property of a web page. I'm trying to retrieve the date at which the webpage/document was last modified by the owner.

Vishwa Iyer
  • 841
  • 5
  • 14
  • 33

2 Answers2

1

A somewhat related question "I am downloading a file using Python urllib2. How do I check how large the file size is?", suggests that the following (untested) code should work:

import urllib2          
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('last-modified'))

You might want to add a default value as the second parameter to getheader(), in case it isn't set.

Community
  • 1
  • 1
holroy
  • 3,047
  • 25
  • 41
1

You can also look for a last-modified date in the HTML code, most notably in the meta-tags. The htmldate module does just that.

Here is how it could work:

1. Install the package:

pip/pip3/pipenv (your choice) -U htmldate

2. Retrieve a web page, parse it and output the date:

from htmldate import find_date

find_date('http://blog.python.org/2016/12/python-360-is-now-available.html')

(disclaimer: I'm the author)

adbar
  • 93
  • 5