I'm trying to do a little bit of HTML parsing in Python which I'm horrible at to be quite honest. I've been up googling ways to do this but can't get anything to work. Here is my situation. I have a web page that has a BUNCH of links to downloads. What I want to do is specify a search string, and if the string I am searching for is there, download the file. But it needs to get the entire file name. For example if I am searching for game-1 and the name of the actual game is game-1-something-else, I want it to download game-1-1something-else. I have already used the following code to obtain the source of the page:
import urllib2
file = urllib2.urlopen('http://www.example.com/my/example/dir')
dload = file.read()
This grabs the entire source code of the webpage which is just a directory by itself. For example, I have tons of tags. I have <a href
tags, <td>
tags, etc. I want to string the tags so all I have is a list of the files in the directory of the web page, then I want to use a regular expression or something simliar to search for what I am searching for, take the entire file name, and download it.