How can I go about selecting all the first sibling to all div.title
that are not enclosed in a tag using beautifulsoup
?
In the example below, I need to retrieve:
[Text I care about which <b>can</b> have formatting...,
Text I care about.,
Text I care about <span class='someclass'>which can be in a span</span>...]
Example
<div class="level1">
<div class="title">
Title I do not care about
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
Text I care about which <b>can</b> have formatting...
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about.
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about <span class='someclass'>which can be in a span</span>...
</div>
</div>
</div>
Please note that I will need to modify the text at specific position using some regex. Therefore, I need the entire text with the formatting tags (b
, br
, span
, etc.)