0

I am trying to create the list which is in the one element. Thus i need to convert the one element into may: containing the dates.

Code:

import requests
from bs4 import BeautifulSoup
import ast

#gets the dates 
URL = ("https://www.worldometers.info/coronavirus/country/us/")
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results=soup.find_all("div", {"class": "col-md-12"})
data=results[4]
script = data.find('script')
string = script.string
datesTEMP = string.strip()[292:]#-9268]
x=datesTEMP

i=0
xaxis=[]
count=0
while count<1:
    if x[i]=="]":
        count=count+1
    else:
        xaxis.append(x[i])
        i=i+1

xaxislength=len(xaxis)

xaxis = [''.join(xaxis[0:xaxislength])]


print(xaxis)
Teerath
  • 45
  • 1
  • 8
  • Can you show the value of the variables in question, and your expected output? This would make the example easier to mentally follow. You could even remove the first parts of the sample then which are about how you got the data in the first place. – Nicolas78 Aug 03 '21 at 07:22
  • I don't understand what you want to do. Better show example data, and what you get, and what you expect. – furas Aug 03 '21 at 08:42
  • it seems you get data from JavaScript and you have something similar to `JSON` so you could use module `json` to convert it to Python list. – furas Aug 03 '21 at 08:46

1 Answers1

1

You try to get values from JavaScript data (Highcharts) in between categories: [ and ] - so you can use split() to get this part. And if you later add back [ and ] - "[" + string + "]" - then you will have string with JSON data which you can convert to list using module json

This gives list with 535 elements

import requests
from bs4 import BeautifulSoup
import json

url = "https://www.worldometers.info/coronavirus/country/us/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

results = soup.find_all("div", {"class": "col-md-12"})

string = results[4].find('script').string

# remove string before dates
substring = string.split("categories: [")[1]

# remove string after dates
substring = substring.split("]")[0]

# create back string with list
substring = "[" + substring + "]"

#print(substring)

# convert string with JSON data into Python list
dates = json.loads(substring)

print('len(dates):', len(dates))

BTW:

My answer for the same page almost 1.5 year ago:

Web scrape coronavirus interactive plots

I found also code on my blog:

Scraping: How to get data from interactive plot created with HighCharts

But I would rather try to get CSV file from GitHub CSSEGISandData / COVID-19 and load it to pandas and calculate all values in pandas. And I think I described it in some answer long time ago.

furas
  • 134,197
  • 12
  • 106
  • 148