0

How can I get/ print only the lines of a big multiline text within one <p> tag containing a certain string? On the website the lines are realized with <br> tags. There is no closing </p> tag.

Basic structure of the website:

<p style="line-height: 150%">
I need a big cup of coffee and cookies.
<br>
I do not like tea with milk.
<br>
I can't live without coffee and cookies.
<br>
...

Let's assume I want to get/ print only the lines containing the words "coffee and cookies". So, in this case only the first and third "line"/ sentence of this <p> should be printed.

I have Beautiful Soup 4.6.3 installed under Python 3.7.1.

findAll seems to be tag-orientated and return the whole <p>, right? So how can I realize it? Maybe with regex or other pattern?

3 Answers3

0

convert bs4.element to string using str() then you can compare it with "coffee and cookies"

from bs4 import BeautifulSoup

html_doc = """<p style="line-height: 150%">
    I need a big cup of coffee and cookies. <a href="aaa">aa</a>
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>"""

soup = BeautifulSoup(html_doc, 'html.parser')
paragraph = soup.find('p')

for p in paragraph:
  if 'coffee and cookies' in str(p):
    next_is_a = p.find_next_sibling('a')
    if next_is_a:
      print(p.strip() + ' ' + str(next_is_a))
    else:
      print(p.strip())
ewwink
  • 18,382
  • 2
  • 44
  • 54
0

If I could understand your requirement correctly then the following snippet should get you there:

from bs4 import BeautifulSoup

htmlelem = """
    <p style="line-height: 150%">
    I need a big cup of coffee and cookies.
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>
"""

soup = BeautifulSoup(htmlelem, 'html.parser')
for paragraph in soup.find_all('p'):
    if not "coffee and cookies" in paragraph.text:continue
    print(paragraph.get_text(strip=True))
SIM
  • 21,997
  • 5
  • 37
  • 109
0

Can you split on \n ?

from bs4 import BeautifulSoup

html = """
    <p style="line-height: 150%">
    I need a big cup of coffee and cookies.
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>
"""

soup = BeautifulSoup(html, 'html.parser')
for item in soup.select('p'):
    r1 = item.text.split('\n')
    for nextItem in r1:
        if "coffee and cookies" in nextItem:
            print(nextItem)
QHarr
  • 83,427
  • 12
  • 54
  • 101