I am looking to use Python to scrape some data from my university's intranet and download all the research papers. I have looked at Python scraping before, but haven't really done any myself I'm sure I read about a Python scraping framework somewhere, should I use that?
So in essence this is what I need to scrape:
- Authors
- Description
- Field
- Then download the file and rename with the paper name.
I will then either put all this in xml or a database, most probably xml and then develop an interface etc at a later date.
Is this doable? Any ideas on where I should start?
Thanks in advance, LukeJenx
EDIT: The framework is Scrapy
EDIT: Turns out that I nearly killed the server today so a lecturer is getting me the copies from the Network team for me... Thanks!