I got following information from EDGAR:
<SERIES-AND-CLASSES-CONTRACTS-DATA>
<EXISTING-SERIES-AND-CLASSES-CONTRACTS>
<SERIES>
<OWNER-CIK>0000074663
<SERIES-ID>S000004984
<SERIES-NAME>Eaton Vance Income Fund of Boston
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000013484
<CLASS-CONTRACT-NAME>Eaton Vance Income Fund of Boston Class A
<CLASS-CONTRACT-TICKER-SYMBOL>EVIBX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000013485
<CLASS-CONTRACT-NAME>Eaton Vance Income Fund of Boston Class B
<CLASS-CONTRACT-TICKER-SYMBOL>EBIBX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000013486
<CLASS-CONTRACT-NAME>Eaton Vance Income Fund of Boston Class C
<CLASS-CONTRACT-TICKER-SYMBOL>ECIBX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000013487
<CLASS-CONTRACT-NAME>Eaton Vance Income Fund of Boston Class R
<CLASS-CONTRACT-TICKER-SYMBOL>ERIBX
</CLASS-CONTRACT>
<CLASS-CONTRACT>
<CLASS-CONTRACT-ID>C000013488
<CLASS-CONTRACT-NAME>Eaton Vance Income Fund of Boston Class I
<CLASS-CONTRACT-TICKER-SYMBOL>EIBIX
</CLASS-CONTRACT>
</SERIES>
</EXISTING-SERIES-AND-CLASSES-CONTRACTS>
</SERIES-AND-CLASSES-CONTRACTS-DATA>
I would ideally like to scrape all information for each tag and its subtags. It seems that for tags within class contract (e.g., class-contract-id) does not have closing tag.
Possibly for this reason, I get the following result when I try this out:
from bs4 import BeautifulSoup
with open("temp.txt",'r') as html_file:
content = html_file.read()
soup = BeautifulSoup(content, 'lxml')
series = soup.find('series')
for item in series:
cik = item.find('owner-cik')
print(cik)
Result:
-1
None
Is there any possible way to sort this out?