You can try using :contains
with CSS Selectors
html = '''
<td>the keyword is present in the <a href='text' title='text'>text</a> </td>
<td>word key is not present</td>
<td>no keyword here</td>'''
soup = BeautifulSoup(html)
print(soup.select('td:contains("keyword")'))
>>> [<td>the keyword is present in the <a href="text" title="text">text</a> </td>,
<td>no keyword here</td>]
EDIT
With new version of BS4 :contains
has been deprecated, You can try using -soup:contains()
or -soup:contains-own()
.
from bs4 import BeautifulSoup as bs
html = """<table><tr>
<td>the keyword is present in the <a href='text' title='text'>text</a> </td>
<td>word key is not present</td>
<td>no keyword here</td>
</table>"""
soup = bs(html)
variable = "keyword"
print(soup.select(f'td:-soup-contains({variable})'.format(variable)))
The above variable
can be passed through command line (Reference
).
import argparse
parser=argparse.ArgumentParser()
parser.add_argument('--keyword', help='Add some keyword to search')
args=parser.parse_args()
keyword = args.keyword
from bs4 import BeautifulSoup as bs
html = """<table><tr>
<td>the keyword is present in the <a href='text' title='text'>text</a> </td>
<td>word key is not present</td>
<td>no keyword here</td></tr>
</table>"""
soup = bs(html,'html5lib')
print(soup.select(f'td:-soup-contains({keyword})'))
the keyword string is present in the text
the_word is here
''' the_word = 'keyword string' soup = BeautifulSoup(html) print(soup.select('p:contains(the_word)')) # it prints [the keyword string is present in the text
] – Sep 11 '21 at 16:38