Is there any lib out there that can take a text (like a html document) and a list of strings (like the name of some products) and then find a pattern in the list of strings and generate a regular expression that would extract all the strings in the text (html document) that match the pattern it found?
For example, given the following html:
<table>
<tr>
<td>Product 1</td>
<td>Product 2</td>
<td>Product 3</td>
<td>Product 4</td>
<td>Product 5</td>
<td>Product 6</td>
<td>Product 7</td>
<td>Product 8</td>
</tr>
</table>
and the following list of strings:
['Product 1', 'Product 2', 'Product 3']
I'd like a function that would build a regex like the following:
'<td>(.*?)</td>'
and then extract all the information from the html that match the regex. In this case, the output would be:
['Product 1', 'Product 2', 'Product 3', 'Product 4', 'Product 5', 'Product 6', 'Product 7', 'Product 8']
CLARIFICATION:
I'd like the function to look at the surrounding of the samples, not at the samples themselves. So, for example, if the html was:
<tr>
<td>Word</td>
<td>More words</td>
<td>101</td>
<td>-1-0-1-</td>
</tr>
and the samples ['Word', 'More words']
I'd like it to extract:
['Word', 'More words', '101', '-1-0-1-']