I would avoid nextSibling, as from your question, you want to include everything up until the next <a>
, regardless of whether that is in a sibling, parent or child element.
Therefore I think the best approach is to find the node that is the next <a>
element and loop recursively until then, adding each string as encountered. You may need to tidy up the below if your HTML is vastly different from the sample, but something like this should work:
from bs4 import BeautifulSoup
#by taking the `html` variable from the question.
html = BeautifulSoup(html)
firstBigTag = html.find_all('big')[0]
nextATag = firstBigTag.find_next('a')
def loopUntilA(text, firstElement):
text += firstElement.string
if (firstElement.next.next == nextATag):
return text
else:
#Using double next to skip the string nodes themselves
return loopUntilA(text, firstElement.next.next)
targetString = loopUntilA('', firstBigTag)
print targetString