I am trying to extract this part from a page:
Using the inspect I see that:
Is the structure defined in the inspect view always follows what bs4
returns?
I am using:
import json
import requests
from bs4 import BeautifulSoup
url = "https://docs.google.com/document/d/e/2PACX-1vSWVk1yd_I_zhVROYN2wv1r1y_54M-QL0199ZQ4g9mQZ7QdzekVzsRFUB_JVfkInwLxDNPmrwlY2x7y/pub?fbclid=IwAR0BsTNrbDeLb6j7tU2XhVxeh9WaQU_vELyDS3oNvem3eapiJ1zoBqZIYes"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.find_all('span',"c2")
But it returns:
[<span class="c2"></span>,
<span class="c2"></span>,
<span class="c2">Gyventojai, kuriems yra daugiau nei 65 metai (1.14 prioritetas)</span>,
<span class="c2"></span>,
<span class="c2">——————————————————————</span>,
<span class="c2"></span>,
<span class="c2"></span>,
<span class="c2"></span>,
<span class="c2">Švietimo sistemos darbuotojai bei abiturientai (1.15 prioritetas)</span>,
<span class="c2">Diplomatai (1.16)</span>,
<span class="c2">Sergantieji lėtinėmis ligomis (1.17)</span>,
<span class="c2">Socialinių paslaugų teikėjai (1.18)</span>,
<span class="c2">1.20 prioritetas: gyvybiškai svarbias valstybės funkcijas atliekantys asmenys, kontaktuojantys su kitais asmenimis (pareigūnai, prekybos įmonių salės darbuotojai ir kt.), išskyrus bendrųjų funkcijų darbuotojus. Šiuo metu šio prioriteto sąrašai nuolat keliami.</span>,
<span class="c2">Gyventojų grupė 55-64 m.</span>,
<span class="c2"></span>,
<span class="c2">.</span>,
<span class="c2"></span>,
<span class="c2"></span>,
<span class="c2"></span>]
Which does not include <p class="c6"><span class="c2">ŠIUO METU - TIK SENJORAI:</span></p>
And I am unsure why because it clearly states class c2
in both inspect view and the data
returned by bs4
.
Should I always follow the nested structure with multiple find
statements or what is the best practice to get the data I desire?