How to scrape "download page" of any site using python

Question

I want scrape download page of any site using python to extract information about version, and its download link. I'm learning python and want to do it with beautifulsoup, but these pages are very complex and it looks pretty much hard to find this stuff. Thanks in advance

FYI it's __scrape__ (and __scraping__, __scraped__, __scraper__) not scrap — DisappointedByUnaccountableMod, Apr 23 '21 at 09:34

score 4 · Answer 1 · edited May 23 '17 at 10:24

4

Welcome to Stack Overflow! --I'm guessing you mean "scrape", because "scrap" means "to throw away".

First things first, you have to use urllib2 to create a file object of the page you want to scrape. Read this to find out how.

Then, you have to figure out what information you want to get off the page itself by inspecting the html content of the page.

Finally, you pass the file object to beautiful soup's parser, and navigate the HTML to return the information you're looking to get.

For future reference, BeautifulSoup has beautiful documentation. If you ever want to get good at programming, you have to learn how to read docs-- it really only gets harder from here.

edited May 23 '17 at 10:24

Community

1
1

answered Nov 08 '12 at 17:48

kreativitea

1,741
12
14

Unless completely necessary, I tend to avoid `urllib2` in favour of http://docs.python-requests.org/en/latest/ – Jon Clements Nov 08 '12 at 17:54
@JonClements I do too, but if someone is stating that they're a beginner, I usually give advise based around the standard library. Requests is a couple months (or weeks, or years, depending on how dedicated a student is,) down the line. – kreativitea Nov 08 '12 at 18:06

How to scrape "download page" of any site using python

1 Answers1