-3

I want scrape download page of any site using python to extract information about version, and its download link. I'm learning python and want to do it with beautifulsoup, but these pages are very complex and it looks pretty much hard to find this stuff. Thanks in advance

vaibhav1312
  • 863
  • 4
  • 13
  • 31

1 Answers1

4

Welcome to Stack Overflow! --I'm guessing you mean "scrape", because "scrap" means "to throw away".

First things first, you have to use urllib2 to create a file object of the page you want to scrape. Read this to find out how.

Then, you have to figure out what information you want to get off the page itself by inspecting the html content of the page.

Finally, you pass the file object to beautiful soup's parser, and navigate the HTML to return the information you're looking to get.

For future reference, BeautifulSoup has beautiful documentation. If you ever want to get good at programming, you have to learn how to read docs-- it really only gets harder from here.

Community
  • 1
  • 1
kreativitea
  • 1,741
  • 12
  • 14
  • Unless completely necessary, I tend to avoid `urllib2` in favour of http://docs.python-requests.org/en/latest/ – Jon Clements Nov 08 '12 at 17:54
  • @JonClements I do too, but if someone is stating that they're a beginner, I usually give advise based around the standard library. Requests is a couple months (or weeks, or years, depending on how dedicated a student is,) down the line. – kreativitea Nov 08 '12 at 18:06