I'm crawling a series of webpages and organising their content into an in-memory knowledge base. I need to execute different code depending on my string input, which is crawled from a website's headings.
tags = browser.find_elements_by_xpath("//div[@class='main-content-entry']/h2")
for tag in tags:
heading = tag.get_attribute("textContent").lower().strip()
content = tag.parent
if heading.find("overview") != -1:
# do this
elif heading.find("takeaways") != -1:
# do that
# do more elifs
else:
# do something else
Right now, I have it implemented as an if-elif-else statement. I've seen answers around the site suggesting the use of dicts, but from what I can tell that's dependent on the input being an exact match to the key. In my case, however, exact matches are not always possible due to inconsistencies on the website owner's part.
The pages are structured enough that I know what the heading names are, so I can define the "keys" in advance in my code. However, there are typos and slight variants in some of the hundred-over pages for some headings. For example:
- Fees & Funding
- Fees
- Fees &Funding
- Certificates
- Certificate
- Certificat & Exams
- Exams & Certificates
The best I can do, as I currently am, is to do a first scan through the pages, identify the entire set of headings, then manually define the substrings to use in my code that would avoid repetitiveness.
Considering the above, is there a better way then to iteratively execute a chained if-elif-else statement?
Edit
The suggested answers in Replacements for switch statement in Python? don't work in my situation. Take for example:
def do_this(heading):
return {
"overview": do_overview(),
"fees": do_fees(),
# ...
}[heading]
This would have been the suggested implementation by that question's answers. But how do I return do_fees()
when heading
is "fees & funding"
, "fees"
, "fees &funding"
etc. etc.? I need to execute the correct function if the key value is a substring of heading
.