Finding a specific url in HTML

Question

i'm new member sorry about my english but it isn't my first language. I would make a python program that extract a specific url from a html page (http://www.kernel.org/pub/linux/kernel/v3.0/). I was able to print on my shell all the link about that page, but i don't know how i extract a specific url, for example linux-3.6.7.tar.bz2. How can i do that?

I would ask you another question: I would that the user to choose the kernel to download on your pc, than specify the kernel, for example 3.2 , 3.3, 3.6, etc etc. How can i make that? Maybe with the regular expression?

Ps: i imported urllib2, HTMLParser, BeautifulSoup and Re.

Please refer this link http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautiful-soup — imsome1, Nov 24 '12 at 11:21

score 0 · Answer 1 · answered Nov 24 '12 at 11:35

from urllib2 import Request, urlopen
from BeautifulSoup import BeautifulSoup
req = Request('http://www.kernel.org/pub/linux/kernel/v3.0/')
response = urlopen(req)
content = response.readlines()
soup = BeautifulSoup(''.join(content))
for link in soup.findAll('a', href=True):
    if ('3.6.7.tar.gz' in link.string):
        print link

Using that...

>>> from urllib2 import Request, urlopen
>>> from BeautifulSoup import BeautifulSoup
>>> req = Request('http://www.kernel.org/pub/linux/kernel/v3.0/')
>>> response = urlopen(req)
>>> content = response.readlines()
>>> soup = BeautifulSoup(''.join(content))
>>> for link in soup.findAll('a', href=True):
...     if ('3.6.7.tar.gz' in link.string):
...         print link
...
<a href="linux-3.6.7.tar.gz">linux-3.6.7.tar.gz</a>
>>>

If you want to customize the search for user input, use python's raw_input() function...

Finding a specific url in HTML

1 Answers1