I am learning to scrape data from website through Python. Extracting weather information about San Francisco from this page. I get stuck while combining data into a Pandas Dataframe. Is it possible to create a dataframe where each rows have different length?
I have already tried 2 ways based on answers here, but they are not excatly what I am looking for. Both answers shift the values of temps column to up. Here is the screen what I try to explain..
1st way: https://stackoverflow.com/a/40442094/10179259
2nd way: https://stackoverflow.com/a/19736406/10179259
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
periods=[pt.get_text() for pt in seven_day.select('.tombstone-container .period-name')]
short_descs=[sd.get_text() for sd in seven_day.select('.tombstone-container .short-desc')]
temps=[t.get_text() for t in seven_day.select('.tombstone-container .temp')]
descs = [d['alt'] for d in seven_day.select('.tombstone-container img')]
#print(len(periods), len(short_descs), len(temps), len(descs))
weather = pd.DataFrame({
"period": periods, #length is 9
"short_desc": short_descs, #length is 9
"temp": temps, #problem here length is 8
#"desc":descs #length is 9
})
print(weather)
I expect that first row of the temp column to be Nan. Thank you.