Searching a website

Question

import urllib
import re
import os
search = (raw_input('[!]Search: '))
site = "http://www.exploit-db.com/list.php?description="+search+"&author=&platform=&type=&port=&osvdb=&cve="   
print site
source = urllib.urlopen(site).read()
founds = re.findall("href='/exploits/\d+",source)
print "\n[+]Search",len(founds),"Results\n"
if len(founds) >=1:
        for found in founds:
                found = found.replace("href='","")
                print "http://www.exploit-db.com"+found
else:
        print "\nCouldnt find anything with your search\n"

When I search the exploit-db.com site I only come up with 25 results, how can I make it go to the other page or go pass 25 results.

Using regexps to parse HTML is wrong. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags and some of the many other threads discussing this topic. — Mike Graham, Feb 19 '10 at 19:35

score 0 · Accepted Answer · answered Feb 19 '10 at 16:27

0

Easy to check by just visiting the site and looking at the URLs as you manually page: just put right after the ? in the URL page=1& to look at the second page of results, or page=2& to look at the third page, and so forth.

How is this a Python question? It's a (very elementary!) "screen scraping" question.

answered Feb 19 '10 at 16:27

Alex Martelli

854,459
170
1,222
1,395

Alex, I ment while is searching for results in page 1 or in general it doesnt jump to the second page or it doesnt pass from 25 results.. not sure whats going on – sourD Feb 19 '10 at 16:31
I guess I should have **bolded** the "`page=1&` to look at the **second** page of results" part of my answer since you accepted a later answer (no doubt mine and that one "crossed over the net" since they were posted so close to each other) that gives exactly this information (but adds the word "Attention";-). – Alex Martelli Feb 19 '10 at 16:54

score 0 · Answer 2 · answered Feb 19 '10 at 16:27

Apparently the exploit-db.com site doesn't allow extending the page size. You therefore need to "manually" page through the result list by repeating the urllib.urlopen() to get subsequent pages. The URL is the same as the one initially used, plus the &page=n parameter. Attention this n value appears to be 0-based (i.e. &page=1 will give the second page)

Searching a website

2 Answers2