I am new to Python coding and BeautifulSoup4. I have a list in HTML that I need to sort, which follows the pattern:
<div id="mgioLangSelector">
<ul id="mgioLangList">
<li><a href="" class="mgio-autonym"><span class="mgioAutonymNative" lang="am">አማርኛ</span><span class="mgioAutonymSeperator"> / </span><span class="mgioAutonymEnglish">Amharic</span</a></li>
<li><a href="" class="mgio-autonym"><span class="mgioAutonymNative" lang="hr">hrvatski</span><span class="mgioAutonymSeperator"> / </span> <span class="mgioAutonymEnglish">Croatian</span</a></li>
<li><a href="" class="mgio-autonym"><span class="mgioAutonymNative" lang="cs">čeština</span><span class="mgioAutonymSeperator"> / </span><span class="mgioAutonymEnglish">Czech</span</a></li>
<li><a href="" class="mgio-autonym"><span class="mgioAutonymNative" lang="vi">tiếng Việt</span><span class="mgioAutonymSeperator"> / </span><span class="mgioAutonymEnglish">Vietnamese</span</a></li>
<li><a href="" class="mgio-autonym"><span class="mgioAutonymNative" lang="sq">shqip</span><span class="mgioAutonymSeperator"> / </span><span class="mgioAutonymEnglish">Albanian</span</a></li>
</ul>
</div>
I need to sort the list in situ and save the resultant HTML. The list needs to be sorted by the contents of the third span, with class = mgioAutonymEnglish
I suspect that I need to use sorted()
with an appropriate key function, but am coming up blank.
I have tried the following code:
from bs4 import BeautifulSoup
from lxml import etree
soup = BeautifulSoup(open("interimResults.html"), 'lxml', from_encoding="utf-8")
matches = soup.find_all("span", attrs={"class": "mgioAutonymEnglish"})
sorted(matches, key=lambda elem: elem.text)
This will sort the contents of the span, but not the lists in the original list. I assume that I need to change the lambda function, but I'm currently at a loss.
What would I need to do or change to successfully sort the list and then save those changes within the HTML document?