- I need to parse a HTML page, get all the URLs meeting my requirement.
Now, I need to parse each of the extracted URLs to get the data that I want, if the page title matches something and save them to multiple files based on their names. I have done part 1 in the following way.
pattern=re.compile(r'''class="topline"><A href="(.*?)"''') da = pattern.search(web_page) da = pattern.findall(soup1) col_width = max(len(word) for row in da for word in row) for row in da: if "some string" in row.upper(): bb = "".join(row.ljust(col_width)) print >> links, bb
I'd truly appreciate any help. Thank you.