I am trying to create a dataframe from three lists which I have generated using webscraped data. However, when I try and turn these lists into dictionaries and then use them to build my pandas dataframe it outputs a dataframe for each dictionary item (row) rather than one dataframe including all of these items as rows within the dataframe.
I believe the issue lies in the for loop that I have used to webscrape the data. I know similar questions have been asked on this one, including here Pandas DataFrame created for each row and here Take multiple lists into dataframe but I have tried the solutions without any joy. I believe the webscrape loop adds a nuance that makes this more tricky.
Step by step walkthrough of my code and the output are below, for reference I have imported pandas as pd and bs4.
# Step 1 create a webscraper which takes three sets of data (price, bedrooms and bathrooms) from a website and populate into three separate lists
for container in containers:
try:
price_container=container.find("a",{"class":"listing-price text-price"})
price_strip=price_container.text.strip()
price_list=[]
price_list.append(price_strip)
except TypeError:
continue
try:
bedroom_container = container.find("span",{"class":"icon num-beds"})
bedroom_strip=(bedroom_container["title"])
bedroom_list=[]
bedroom_list.append(bedroom_strip)
except TypeError:
continue
try:
bathroom_container=container.find("span", {"class":"icon num-baths"})
bathroom_strip=(bathroom_container["title"])
bathroom_list=[]
bathroom_list.append(bathroom_strip)
except TypeError:
continue
# Step 2 create a dictionary
data = {'price':price_list, 'bedrooms':bedroom_list, 'bathrooms':bathrooms_list}
# Step 3 turn it into a pandas dataframe and print the output
d=pd.DataFrame(data)
print(d)
This gives me a dataframe for each dictionary as below.
price bedrooms bathrooms
0 £200,000 3 2
[1 rows x 3 columns]
price bedrooms bathrooms
0 £400,000 5 3
[1 rows x 3 columns]
prices bedrooms bathrooms
0 £900,000 6 4
[1 rows x 3 columns]
and so on.....
I've tried dictionary comprehension and list comprehension, to give me one dataframe rather than a dataframe for each dictionary item:
data = [({'price':price, 'bedrooms':bedrooms, 'bathrooms':bathrooms}) for item in container]
df = pd.DataFrame(data)
print(df)
and, despite how I do the list expression, this yields an even weirder output. It gives me a dataframe for each item in the dictionary with the same row of information repeated a number of times
price bedrooms bathrooms
0 £200,000 3 2
0 £200,000 3 2
0 £200,000 3 2
[3 rows x 3 columns]
price bedrooms bathrooms
0 £400,000 5 3
0 £400,000 5 3
0 £400,000 5 3
[3 rows x 3 columns]
price bedrooms bathrooms
0 £900,000 6 4
0 £900,000 6 4
0 £900,000 6 4
[1 rows x 3 columns]
and so on...
How do I resolve this problem and get all of my data into one pandas dataframe?