I am scrapping England's Joint Data and have the results in the correct format I want when I do one hospital at a time. I eventually want to iterate over all hospitals but first decided to make an array of three different hospitals and figure out the iteration.
The code below gives me the correct format of the final results in a pandas DataFrame when I have just one hospital:
import requests
from bs4 import BeautifulSoup
import pandas
import numpy as np
r=requests.get("http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?
hospitalName=Norfolk%20and%20Norwich%20Hospital")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all(["div"],{"class":"toggle_container"})[1]
i=0
temp = []
for item in all.find_all("td"):
if i%4 ==0:
temp.append(soup.find_all("span")[4].text)
temp.append(soup.find_all("h5")[0].text)
temp.append(all.find_all("td")[i].text.replace(" ",""))
i=i+1
table = np.array(temp).reshape(12,6)
final = pandas.DataFrame(table)
final
In my iterated version, I cannot figure out a way to append each result set into a final DataFrame:
hosplist = ["http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Norfolk%20and%20Norwich%20Hospital",
"http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Barnet%20Hospital",
"http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Altnagelvin%20Area%20Hospital"]
temp2 = []
df_final = pandas.DataFrame()
for item in hosplist:
r=requests.get(item)
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all(["div"],{"class":"toggle_container"})[1]
i=0
temp = []
for item in all.find_all("td"):
if i%4 ==0:
temp.append(soup.find_all("span")[4].text)
temp.append(soup.find_all("h5")[0].text)
temp.append(all.find_all("td")[i].text)
i=i+1
table = np.array(temp).reshape((int(len(temp)/6)),6)
temp2.append(table)
#df_final = pandas.DataFrame(df)
At the end, the 'table' has all the data I want but its not easy to manipulate so I want to put it in a DataFrame. However, I am getting an "ValueError: Must pass 2-d input" error.
I think this error is saying that I have 3 arrays which would make it 3 dimensional. This is just a practice iteration, there are over 400 hospitals whose data I plan to put into a dataframe but I am stuck here now.