I am currently writing a script that parses through a xml webpage using beautifulsoup. An example of the xml file is here. The script basically will output the first product URL (from each 'loc' tag) which matches a list of keywords that have been inputted. Currently, the script's control flow is the following:
- pass the URL into a soup object and beautify it
run a for loop for each url tag, and put each loc text into a list (inventory_url)
for item in soup.find_all('url'): inventory_url.append(item.find('loc').text)
iterate through the list, and output the first element that matches all keywords, where 'keywords' is the inputted list of keywords
for item in inventory_url: if all(kw in item for kw in keywords): return item
I am wondering if there is a way to make the parsing faster. I have looked at soupstrainer, but when I isolate to only find 'loc' tags, it also takes in 'image:loc' tags, which I do not need.
Thank you very much.