While creating a scraper, I encountered a situation where I have a bunch of keywords and there are too many to hard code. So I wanted to implement a regular expression that reads from a "dictionary" file, it contains the keywords, and when the crawler / scraper matches one of the keywords on a certain website it scrapes the whole paragraph containing the keywords.
A single paragraph scraping model of the code is looking like this :
for Keyword in response.xpath('//*'):
yield {
'dictA': Keyword.xpath('//p/text()[contains(..,"Specific Keyword/s")]').extract(),
}
This is what gets me the whole paragraph that this "Specific Keyword/s" contains. But I have, let's say around 100 words, I don't want to do:
dictA1
.
.
.
dictA100
It is inefficient. How could I go behind this. As always hints and pointing helps and is welcome.