Get the text from a div tag in html with bs4 python

Question

I have a website and I wan't to pull the text form a div tag with bs4 using an external website. and this is a flask website

#Importing librarys 
from flask import Flask, render_template 
import sys
import json
import requests
import urllib.request
import time
from bs4 import BeautifulSoup


#Importing files and class from other python files in the project
sys.path.append('.')
from webScrape import getInformation

#Making a new app instance
app = Flask(__name__)

#Saying if the app is on route / the open index.html
@app.route('/')
def index():
    URL = 'https://covidstat.info/home'

    HTML = requests.get(URL)
    soup = BeautifulSoup(HTML.text, "html.parser")
    tag = soup.findAll('div', {'class': 'count'})
    print(tag.text)
    return render_template('index.html', tag=tag)

#Running the app on port 5000
if __name__== '__main__':
    app.run(debug=True, host='0.0.0.0',)

Oh and I have another question anyone know how I can get an element using xpath in bs4

Oh and I have another question anyone know how I can get an element using xpath in bs4 — , May 21 '20 at 06:45
what is the problem with your code are you getting any error — deadshot, May 21 '20 at 06:47
No I am getting this [
2,735,342
,
2,025,878
,
329,757
,
442
,
4
,
2,615,920
] — , May 21 '20 at 06:49
If I know now to use xpath with bs4 it should solve my issue — , May 21 '20 at 06:49

score 0 · Accepted Answer · answered May 21 '20 at 07:02

0

With soup.findAll you will return a list of divs in this case. For this reason you have to access them individually in a loop. You can also use a list comprehension like this:

tag_text = [t.text for t in tag]

Which returns: ['2,735,342', '2,025,878', '329,757', '442', '4', '2,615,920']

Alternatively, you can use soup.find instead, which will just return the first div, and you could access it directly by tag.text which will give '2,735,342'.

To get the element by xpath is to use the inspector, by right-clicking on the text you want -> Inspect Element -> right click on the div-tag -> Copy -> XPath.

The xpath for the number used before would be:

/html/body/div[1]/div/div[2]/div[2]/div/div/div[2]/div/div[1]/div/div[1]/div/div/div/ul/li[1]/div[2]

As of my knowledge, BS4 does not support xpath selection, so you'ld have to change to another library. I know Selenium supports it, but would probably not be the best use-case for this task.

answered May 21 '20 at 07:02

Mikkel Duif

18
1
6

Do you know how to split this up – May 21 '20 at 11:22
Hi there, I'm not really sure what you mean by this question. Could you elaborate a bit? – Mikkel Duif May 21 '20 at 12:49
Sure, when I irate though the text it returns a list I wan't those values individually. – May 21 '20 at 12:52
I'm still not sure I understand 100%, but you can just index the list, e.g. tag_text[0] to get the first item and tag_text[-1] if you want the last one. Is this what you mean? – Mikkel Duif May 21 '20 at 13:29

Get the text from a div tag in html with bs4 python

1 Answers1