0

I am attempting to research what I need to learn thru this existing SO post about reading thru pandas (or better method?) values for an hourly weather query from weather bug. Thru the weather bug website if I enter in a zip code or city I can retrieve 12 hours of future hourly weather data.

How would I put just the temperature data in a pandas dataframe? (1 column with 12 rows representing hour 1 thru 12)?

Thank you for any tips sorry not a lot of wisdom here..

import requests

url = 'https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

dfs = pd.read_html(r.text)

If I run this, I will get an error

ValueError: No tables found

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
bbartling
  • 3,288
  • 9
  • 43
  • 88

1 Answers1

0

Well read_html can read tables in a html document. And the error text is right: the document contains no table, but is formatted using a bunch of div elements.

That means that pandas cannot process it directly. You should instead use BeautifulSoup to parse the html and extract the relevant info into lists and dictionaries, and then build a dataframe from those python containers.


The rule is that it depends on the page. The normal way is to use the developper's tools of your browser to see how the page is structured and identify the relevant elements. Then you control with display source that the elements were transmitted with the HTML and not through javascript. Here you are lucky because the data is directly inside the HTML part.

So you should:

  • use requests or urllib.request to download the page
  • use BeautifulSoup to extract the elements identified in the developper's tool of the browser
  • concatenate everything
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • I updated some code that I have been experimenting with. Would you know of an example I can look at with beautiful soup? I think I understand this step of needing beautiful soup prior to pandas – bbartling Mar 16 '20 at 15:55
  • @HenryHub: I have add a few lines to my answer. For BeautifoulSoup the official doc is really nice and easy to use. – Serge Ballesta Mar 16 '20 at 16:05
  • would you have any tips to where to even start looking for the hourly forecast values when viewing the page thru Google Chrome developer tool? – bbartling Mar 16 '20 at 20:56
  • @HenryHub: Unsure for Chrome, but for Firefox I have a tool that allows to select an element on the page and show where it is in the page. – Serge Ballesta Mar 17 '20 at 06:25
  • any chance you could help me a little more? Thank you so much for giving me tips to research.. https://stackoverflow.com/questions/60728681/beautiful-soup-loop-over-div-element-in-html – bbartling Mar 17 '20 at 19:01