How to detect if a file on a server has changed with Python/Scrapy

Question

This is a follow-up to this question.

I want to download PDF files daily. So far my Scrapy code already works. Now I want to find out if the PDF files have changed.

Does Scrapy have a built-in mechanism for this? I couldn't find a hint in the documentation.

If not, I would download the respective PDF file and compare it with the previous day's PDF file using this model:

Python library to detect if a file has changed between different runs?

Check out this new open source monitoring tool [spidermon](https://spidermon.readthedocs.io/en/latest/index.html]) Not tried though but I believe it will work in your case. — Pankaj, Mar 12 '19 at 15:40

score 3 · Accepted Answer · answered Mar 12 '19 at 14:59

3

You download the pdf for the first time and save it.
Next time you download it you calculate the hash of the previous file and the new file. If both values are the same - there is no change in the file.

answered Mar 12 '19 at 14:59

balderman

22,927
7
34
52

How to detect if a file on a server has changed with Python/Scrapy

1 Answers1