I'm writing a script using BeautifulSoup
to extract text from <p>
elements; it works well until I encounter a <p>
element that contains <br>
tags, in which case it only captures the text BEFORE the first <br>
tag. How can I edit my code to capture all of the text?
My code:
coms = soup.select('li > div[class=comments]')[0].select('p')
inp = [i.find(text=True).lstrip().rstrip() for i in coms]
The problem HTML (note the <br>
tags):
<p>
Alts called now through 53. No more will be called til the 12:50 group. EMCs are still on the table to be seen.<br>
<br>
ITR info:<br>
<br>
Rachel Hoffman, CD<br>
Chris Kory, acc.<br>
<br>
Monitor is Iftiaz Haroon. </p>
What my code currently outputs:
>> 'Alts called now through 53. No more will be called til the 12:50 group. EMCs are still on the table to be seen.'
What my code SHOULD output (note the extra text):
>> 'Alts called now through 53. No more will be called til the 12:50 group. EMCs are still on the table to be seen. ITR info: Rachel Hoffman, CD Chris Kory, acc. Monitor is Iftiaz Haroon.'
(Note: Forgive my sometimes-questionable terminology; I'm largely self-taught.)
tag](https://stackoverflow.com/questions/48722571/how-do-i-use-beautifulsoup4-to-get-all-text-before-br-tag) – TheoretiCAL Mar 01 '18 at 19:11