I hate to rain on your parade, but building a lookup list of law schools and then doing a set membership type of test in the source code probably will not work. The flawed approach:
schools = []
html = page.read()
for school in list:
if school in html:
schools.append(school)
The reason why is this: you're assuming law school names are represented uniformly on lawyer websites, but that assumption isn't reliable. For example, I went to a law school called University of California, Hastings College of the Law. Sometimes it appears on lawyer websites as Hastings College of Law, and others it appears as UC Hastings. Often the data about where a lawyer went to school is collected directly from the lawyer, so it will appear verbatim as he or she supplied it. You probably can't assume the data was later normalized.
As a result, any school names that deviate from your lookup list won't be found. To further complicate matters, the shortest version of my school's name--UC Hastings--might even confound a difflib 'get close matches' lookup unless you set the match ratio very low, which inevitably causes the routine to find a number of other false positives as well.
Here's my advice. Spider a list of all law school names and put it in a database table. Create a second table with known deviations from the list. Each time you spider a site, try a basic set membership test in the lookup list (or dynamically generated regex). In the probable event that such a lookup fails, make the script throw an error and print the unmatched school to a console. Add that school the table of known variants and key it to the correct school name in the main lookup table. Repeat this process until you feel confident you have most variants accounted for. From there, add a hack to check unfound school names against a list of the official lookup items and all known variants using
difflib.get_close_matches
Use this kind of method to return the closest valid match any time a school isn't found. It may be the best your clients can ask for. I use django for this kind of thing because the built-in database admin makes it easy to add in known variants.