Searching through a website directory, validate, then place URL in a list depending on content

Question

I've been working on a script and I thought I would ask for help. I'm looking to search a series of websites, check if the site is valid. Then the next step would be to check for specific content on the site. If the site holds that content, place the URL in a list.

import urllib2  

def getPage():  

    url="import urllib2  

National=[]
Local=[]
Sports=[]
Culture=[]

def getPage():  

    url="http://readingeagle.com/section.aspx?id=2"     

    for i in range (0,100,1)
        req = urllib2.Request(http://readingeagle.com/section.aspx?id=,i)
    if "national" in response:  

    response = urllib2.urlopen(req)  

    return response.read()
    for g in range (0,100,1)
    if "national" in response:
        National.append("http://readingeagle.com/section.aspx?id=,g"


# I would like to set-up an iteration to check the 'entryid from 1-100. If the term is found on the page, place the url in the list.

if __name__ == "__main__":  

    namesPage = getPage()  

    print (namesPage)

Your question... is not a question! If you want other programmers to help you out, you should be much more specific: what is your problem? What do you think it might be wrong? What have you tried so far to find a solution yourself? — mac, Jul 05 '11 at 00:37

score 0 · Answer 1 · edited May 23 '17 at 12:27

0

Here's my answer to the question of how to validate a given web site.

python check html valid

For checking the context of the page the tools consist of basic string methods, regex, or more sophisticated tools like lxml or beautifulsoup.

matchingSites = []
matchingSites.append(url) #Since you asked. :-p

edited May 23 '17 at 12:27

Community

1
1

answered Jul 05 '11 at 04:48

Peter Lyons

142,938
30
279
274

Searching through a website directory, validate, then place URL in a list depending on content

1 Answers1