Python Webscraping Table Elements from Weather API

Question

So I'm trying to use the national weather service API to get a single element from a table. Specifically, I'm trying to get the current weather of where I live. The entire webpage is just one giant table and I haven't found a way to get information, besides printing everything beautiful soup grabs. How do I get the period 0 shortForcast information(if you go to the website) or how do I get started with the basic setup? Any help will be appreciated(also sorry for the extra imports, I've been trying a lot of different approaches).

import requests
from bs4 import BeautifulSoup
import lxml.html as lh
import pandas as pd
from html_table_parser import HTMLTableParser as pars
from pprint import pprint

URL = 'https://api.weather.gov/gridpoints/ILN/22,23/forecast/hourly'

def url_content(url): 
    req = urllib.request.Request(url=url) 
    f = urllib.request.urlopen(req)
    return f.read()

def main():
    xhtml = url_content(url).decode('utf-8')
    p = pars() 
    p.feed(xhtml) 
    pprint(p.tables[0]) 

if __name__ == "__main__":
    main()

This is the error if it helps, targeted at the "pprint(p.tables[0])" in the main method.

IndexError: list index out of range

This looks like JSON. Any reason not to use `json.loads(f.read())`? — ggorlen, Feb 02 '21 at 01:12
i can see that. I'm talking about the data returned by the API. See [this](https://stackoverflow.com/questions/7771011/how-to-parse-data-in-json-format) — ggorlen, Feb 04 '21 at 14:22
I see what you're talking about now and thank you, I was confused about how the API worked. Really my last problem is that when I get the data from beautifulsoup its gives me everything which includes 155 hours of data. Every hour being a new update with weather data. Is there a way to prevent soup from grabbing everything and limiting it to the latest hour? Also because there's so much data printed to my IDE I cant see the set between 0-89. Im forced to see everything from 90 hours ago which doesn't help in the slightest. If you have any ideas it would be a great help. — Harlaquin, Feb 09 '21 at 23:15
No problem, happy to help. You don't need beautifulsoup, that's an HTML parser. This is a JSON string which you can parse to dict as shown above. If you want less than 90 hours, consult the API documentation to figure out how to modify the GET request with parameters to limit the count of records returned, or (less ideally) parse the whole thing and only access the data you need. It's always nice to change it upstream so less data is transmitted, but if they are going to deliver the same payload and don't offer a knob, it's not that huge that it should make a major impact one way or the other. — ggorlen, Feb 09 '21 at 23:19

Python Webscraping Table Elements from Weather API

0 Answers0