I am learning Python and I am trying to create a function to web scrape tables of vaccination rates from several different web pages - a github repository for Our World in Data https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations/country_data and https://ourworldindata.org/about. The code works perfectly when web scraping a single table and saving it into a data frame...
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/country_data/Bangladesh.csv"
response = requests.get(url)
response
scraping_html_table_BD = BeautifulSoup(response.content, "lxml")
scraping_html_table_BD = scraping_html_table_BD.find_all("table", "js-csv-data csv-data js-file-line-container")
df = pd.read_html(str(scraping_html_table_BD))
BD_df = df[0]
But I have not had much luck when trying to create a function to scrape several pages. I have been following the tutorial on this website 3 in the section 'Scrape multiple pages with one script' and StackOverflow questions like 4 and 5 amongst other pages. I have tried creating a global variable first but I end up with errors like "Recursion Error: maximum recursion depth exceeded while calling a Python object". This is the best code I have managed as it doesn't generate an error but I've not managed to save the output to a global variable. I really appreciate your help.
import pandas as pd
from bs4 import BeautifulSoup
import requests
link_list = ['/Bangladesh.csv',
'/Nepal.csv',
'/Mongolia.csv']
def get_info(page_url):
page = requests.get('https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations/country_data' + page_url)
scape = BeautifulSoup(page.text, 'html.parser')
vaccination_rates = scape.find_all("table", "js-csv-data csv-data js-file-line-container")
result = {}
df = pd.read_html(str(vaccination_rates))
vaccination_rates = df[0]
df = pd.DataFrame(vaccination_rates)
print(df)
df.to_csv("testdata.csv", index=False)
for link in link_list:
get_info(link)
edit: I can view the final webpage that is iterated as it saves to a csv file, but not the data from the preceding links.
new = pd.read_csv('testdata6.csv')
pd.set_option("display.max_rows", None, "display.max_columns", None)
new