How do i get a table and put it in excel without long code

Question

I have a google sheet with coronavirus data. I want to update it using the worldometers site. I don't want to copy the CSS selector for every single of the thousands of cells in my code.

I tried getting the table but it is separated by newlines after each. I got the table with the following code

    import bs4
    import requests

    res = requests.get('https://www.worldometers.info/coronavirus')
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    print(len(soup.select('table')))
    txt = soup.select('table')[1]
    print(txt.text)

Is there a way to make what we get from the table int a format that can be put into excel or put the table HTML itself into excel so that will formatted properly.

If you use Power Query in Excel, and just type in that URL, it can easily download the "now" and "yesterday" tables into Excel. I f you need to breakdown each country, it is more complicated. — Ron Rosenfeld, May 02 '20 at 23:32

westload · Accepted Answer · 2020-05-02T20:00:48.300

I suggest you take a look at the following python modules:

pandas: for handling tabular data (official documentation, guide to convert an HTML table into excel);
openpyxl: for Excel Spreadsheets (see this guide);
EZSheets; for Google Spreadsheets (see this guide).

I hope these resources will help you.

Edit: the code below (based on this) should allow you to retrieve the table.

import pandas as pd, openpyxl, os.path

if os.path.isfile("coronaData.xlsx") == False:
    openpyxl.Workbook().save("coronaData.xlsx")

url = "https://www.worldometers.info/coronavirus"
hdr = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
    }

#See https://stackoverflow.com/questions/45943179/python-pandas-typeerror-first-argument-must-be-string-or-compiled-pattern#45944194
key = str(frozenset(hdr.items())) 

table = pd.read_html(url, key)[0] 

table.to_excel("coronaData.xlsx")

However, I keep stumbling upon the error urllib.error.HTTPError: HTTP Error 403: Forbidden, which could probably be solved with the correct headers (hdr). Maybe it would be easier with another website (without the hdr and key variables; e.g. Wikipedia).

score 0 · Answer 2 · answered May 12 '20 at 04:08

I think I found an answer to my question. I did some research and found how to make it into a CSV file by appending each row. I can also take the population using this method. After I got the CSV file I just put it in my google drive and converted it to google sheet. The code is shown below.

    import bs4
    import requests
    import csv

    res = requests.get('https://www.worldometers.info/coronavirus')
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    print(len(soup.select('table')))
    txt = soup.select('table')[1]
    txt2 = bs4.BeautifulSoup(requests.get('https://www.worldometers.info/world- 
    population/population-by-country/').text,
                     'html.parser').select('table')[0]
    output_covidrows = []
    output_populatoinrows = []

    for table_row2 in txt2.findAll('tr'):
    columns2 = table_row2.findAll('td')
    output_row2 = []
    for column2 in columns2:
    output_row2.append(column2.text.strip())
    output_populatoinrows.append(output_row2)

    for table_row in txt.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
         output_row.append(column.text.strip())
         for populationentry in output_populatoinrows:
             if len(populationentry) < 2 or len(output_row) < 2:
                 continue
             if populationentry[1] == output_row[0]:
                  output_row.append(populationentry[2])
                  output_row.append(populationentry[9])
         if len(output_row) > 2:
               output_covidrows.append(output_row)
    with open('output.csv', 'w') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(
            ['Country', 'Total Cases', 'New Cases', 'Total Deaths', 'New Deaths', 
             'Total Recovered', 'Active Cases',
              'Serious Cases', 'Tot Cases per 1m', 'Deaths per 1m', 'Total tests', 
             'Tests per 1m', 'Region', 'Population',
              'Med Age'])
         writer.writerows(output_covidrows)

    print('Done')

How do i get a table and put it in excel without long code

2 Answers2