0

This is a follow-up to this question.

I want to download PDF files daily. So far my Scrapy code already works. Now I want to find out if the PDF files have changed.

Does Scrapy have a built-in mechanism for this? I couldn't find a hint in the documentation.

If not, I would download the respective PDF file and compare it with the previous day's PDF file using this model:

Python library to detect if a file has changed between different runs?

Egon Allison
  • 1,329
  • 1
  • 13
  • 22
R0byn
  • 363
  • 5
  • 15
  • 1
    Check out this new open source monitoring tool [spidermon](https://spidermon.readthedocs.io/en/latest/index.html]) Not tried though but I believe it will work in your case. – Pankaj Mar 12 '19 at 15:40

1 Answers1

3

You download the pdf for the first time and save it.
Next time you download it you calculate the hash of the previous file and the new file. If both values are the same - there is no change in the file.

balderman
  • 22,927
  • 7
  • 34
  • 52