I learned recently that you can use wget -r -P ./pdfs -A pdf http://example.com/
to recursively download pdf
files from a website. However this is not cross-platform as Windows doesn't have wget
. I want to use Python to achieve the same thing. The only solutions I've seen are non-recursive - e.g. https://stackoverflow.com/a/54618327/3042018
I would also like to be able to just get the names of the files without downloading so I can check if a file has already been downloaded.
There are so many tools available in Python. What is a good solution here? Should I use one of the "mainstream" packages like scrapy
or selenium
or maybe just requests
? Which is the most suitable for this task please, and how do I implement it?