2
import requests

from bs4 import BeautifulSoup


def get_data_from_web():
    url = "http://mohfw.gov.in"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find('div', class_='col-xs-8 site-stats-count')
    li = div.find_all('li')
    print(li)
    
get_data_from_web()

Im trying to extract Corona stats from http://mohfw.gov.in , but I'm getting only one first li

while there are total of 3 li,

I tried by giving class specifically for those li tags but I'm getting none response

<div class="col-xs-8 site-stats-count"> 
    <ul style="margin-top:0px;">
        <li class="bg-blue">
        <strong class="mob-hide">Active &nbsp;<span class="active_per"></span></strong>
        <strong class="mob-hide">973175<span class='up'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(14859<i class='fa fa-arrow-up'></i>)</span></strong>
        <!--<span class='down'>3565 <i class='fa fa-arrow-down'></i></span>-->      
        <span class="mob-show">Active </span>
        <span class="mob-show"><span class="active_per"></span> </span> 
        <span class="mob-show"><strong>973175<span class='up'><br>(14859<i class='fa fa-arrow-up'></i>)</span></strong></span> </span>  
        </li> 
        <li class="bg-green">
        <strong class="mob-hide">Discharged &nbsp;<span class="discharged_per"></span></strong>
        <strong class="mob-hide">3702595<span class='cup'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(78399<i class='fa fa-arrow-up'></i>)</span></strong>
        <span class="mob-show">Discharged </span>
        <span class="mob-show"><span class="discharged_per"></span> </span> 
        <span class="mob-show"><strong>3702595<span class='cup'><br>(78399<i class='fa fa-arrow-up'></i>)</span></strong></span> </span>
        </li>                               
        <li class="bg-red">
        <strong class="mob-hide">Deaths &nbsp;<span class="death_per"></span></strong>
        <strong class="mob-hide">78586&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class='up'>   (1114<i class='fa fa-arrow-up'></i>)</span></strong>
        <span class="mob-show">Deaths </span>
        <span class="mob-show"><span class="death_per"></span> </span>  
        <span class="mob-show"><strong>78586<span class='up'><br>(1114<i class='fa fa-arrow-up'></i>)</span></strong></span> </span>
        <!--<span class='down'> <i class='fa fa-arrow-down'></i></span>-->      
        </li>
        </ul></div>
Sahil
  • 51
  • 3

2 Answers2

2

The HTML markup on that page is broken, try to parse it with lxml or html5lib parsers:

import requests
from bs4 import BeautifulSoup


def get_data_from_web():
    url = "http://mohfw.gov.in"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')      # <-- change to lxml or html5lib
    div = soup.find('div', class_='col-xs-8 site-stats-count')
    lis = div.find_all('li')
    for li in lis:
        print(li)
        print('-' * 80)

get_data_from_web()

Prints:

<li class="bg-blue">
<strong class="mob-hide">Active  <span class="active_per"></span></strong>
<strong class="mob-hide">973175<span class="up">     (14859<i class="fa fa-arrow-up"></i>)</span></strong>
<!--<span class='down'>3565 <i class='fa fa-arrow-down'></i></span>-->
<span class="mob-show">Active </span>
<span class="mob-show"><span class="active_per"></span> </span>
<span class="mob-show"><strong>973175<span class="up"><br/>(14859<i class="fa fa-arrow-up"></i>)</span></strong></span>
</li>
--------------------------------------------------------------------------------
<li class="bg-green">
<strong class="mob-hide">Discharged  <span class="discharged_per"></span></strong>
<strong class="mob-hide">3702595<span class="cup">     (78399<i class="fa fa-arrow-up"></i>)</span></strong>
<span class="mob-show">Discharged </span>
<span class="mob-show"><span class="discharged_per"></span> </span>
<span class="mob-show"><strong>3702595<span class="cup"><br/>(78399<i class="fa fa-arrow-up"></i>)</span></strong></span>
</li>
--------------------------------------------------------------------------------
<li class="bg-red">
<strong class="mob-hide">Deaths  <span class="death_per"></span></strong>
<strong class="mob-hide">78586     <span class="up">   (1114<i class="fa fa-arrow-up"></i>)</span></strong>
<span class="mob-show">Deaths </span>
<span class="mob-show"><span class="death_per"></span> </span>
<span class="mob-show"><strong>78586<span class="up"><br/>(1114<i class="fa fa-arrow-up"></i>)</span></strong></span>
<!--<span class='down'> <i class='fa fa-arrow-down'></i></span>-->
</li>
--------------------------------------------------------------------------------
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Just FYI - for additional reading - https://stackoverflow.com/questions/25714417/beautiful-soup-and-table-scraping-lxml-vs-html-parser – DS_ Sep 13 '20 at 17:12
0

I tried to get the div info and it seems the div ends with the first li tag. Below is the code . try running it once and you will see.

import requests
from bs4 import BeautifulSoup

def get_data_from_web():
    print("here")
    url = "http://mohfw.gov.in"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find('div', class_='col-xs-8 site-stats-count')
    li = div.find_all('li')
    print(div)
    
get_data_from_web()

Here is the output -

<div class="col-xs-8 site-stats-count">
<ul style="margin-top:0px;">
<li class="bg-blue">
<strong class="mob-hide">Active  <span class="active_per"></span></strong>
<strong class="mob-hide">973175<span class="up">     (14859<i class="fa fa-arrow-up"></i>)</span></strong>
<!--<span class='down'>3565 <i class='fa fa-arrow-down'></i></span>-->
<span class="mob-show">Active </span>
<span class="mob-show"><span class="active_per"></span> </span>
<span class="mob-show"><strong>973175<span class="up"><br/>(14859<i class="fa fa-arrow-up"></i>)</span></strong></span> </li></ul></div>
DS_
  • 247
  • 2
  • 10
  • check question replaced the image with HTML code, doesn't seem like the div ends with the first li tag. line 255 : https://www.mohfw.gov.in/ – Sahil Sep 13 '20 at 05:09