0

I'm trying to fill a dataframe with historical hourly weather data. Done by calling the DarkSky API. However, sometimes certain fields will be missing and present a KeyError.

Here's what the API sends back for each hour:

'summary': 'Mostly cloudy throughout the day.',
'icon': 'partly-cloudy-day',
'data': [{
   'time': 1528354800,
   'summary': 'Partly Cloudy',
   'icon': 'partly-cloudy-night',
   'precipIntensity': 0,
   'precipProbability': 0,
   'temperature': 12.94,
   'apparentTemperature': 12.94,
   'dewPoint': 9.36,
   'humidity': 0.79,
   'pressure': 1011.4,
   'windSpeed': 2.69,
   'windGust': 2.69,
   'windBearing': 252,
   'cloudCover': 0.33,
   'uvIndex': 0,
   'visibility': 13.818}]

So when filling my dataframe I'll get a KeyError because sometimes precipIntensity and precipProbability won't be present and instead have one field called precipType.

Here's how I'm trying to fill the dataframe:

VICTORIA = 48.407326, -123.329773
        dt = datetime(2018, month, day).isoformat()
        weather = forecast('APIKEY', *VICTORIA, time = dt)
        weather.refresh(units='si')
        for hour in weather['hourly']['data']:
            daily_weather = daily_weather.append(
            {'time': hour['time'],
             'realtime': datetime.fromtimestamp(hour['time']),
             'summary': hour['summary'],
             'icon': hour['icon'],
             'precipIntensity': hour['precipIntensity'],
             'precipProbability': hour['precipProbability'],
             'temperature': hour['temperature'],
             'apparentTemperature': hour['apparentTemperature'],
             'dewPoint': hour['dewPoint'],
             'humidity': hour['humidity'],
             'pressure': hour['pressure'],
             'windSpeed': hour['windSpeed'],
             'windBearing': hour['windBearing'],
             'cloudCover': hour['cloudCover'],
             'uvIndex': hour['uvIndex'],
             'visibility': hour['visibility'],
             }, ignore_index=True)

I've attempted to use try/except statements to make exceptions like so:

for hour in weather['hourly']['data']:
        daily_weather = daily_weather.append(
        {'time': hour['time'],
         'realtime': datetime.fromtimestamp(hour['time']),
         'summary': hour['summary'],
         'icon': hour['icon'],
         'temperature': hour['temperature'],
         'apparentTemperature': hour['apparentTemperature'],
         'dewPoint': hour['dewPoint'],
         'humidity': hour['humidity'],
         'pressure': hour['pressure'],
         'windSpeed': hour['windSpeed'],
         'windBearing': hour['windBearing'],
         'cloudCover': hour['cloudCover'],
         'uvIndex': hour['uvIndex'],
         'visibility': hour['visibility'],
         }, ignore_index=True)
        try:
            daily_weather = daily_weather.append({'precipIntensity': hour['precipIntensity'], 'precipProbability': hour['precipProbability']}, ignore_index=True)
        except KeyError:
            daily_weather = daily_weather.append({'precipType': hour['precipType']}, ignore_index=True)

However the precipIntensity field fills in unused rows instead of being with the others:

Dataframe Output

I'd love some advice on how to use exception statements when trying to fill a dataframe. Thank you.

Kate Orlova
  • 3,225
  • 5
  • 11
  • 35
  • Rather than a try/except, you _could_ have an intermediate step of filling a dictionary based on the fields in the response. The [dict.get](https://stackoverflow.com/questions/11041405/why-dict-getkey-instead-of-dictkey) method can be used to fill in a default `np.nan` value where you don't have data, then you make your row based on the constructed dict instead of the response – G. Anderson Jun 18 '19 at 19:03

1 Answers1

0

You're creating to different rows in your output list with the two calls to append in your code. Save the dict for each row in a local variable, populate it and then append it to your list.

For code readability reasons I would also recommend not using a try/catch but rather just a straight forward if check. You could even automate it for multiple optional fields.

Example (not tested):

for hour in weather['hourly']['data']:
     row = {
         'time': hour['time'],
         'realtime': datetime.fromtimestamp(hour['time']),
         'summary': hour['summary'],
         'icon': hour['icon'],
         'temperature': hour['temperature'],
         'apparentTemperature': hour['apparentTemperature'],
         'dewPoint': hour['dewPoint'],
         'humidity': hour['humidity'],
         'pressure': hour['pressure'],
         'windSpeed': hour['windSpeed'],
         'windBearing': hour['windBearing'],
         'cloudCover': hour['cloudCover'],
         'uvIndex': hour['uvIndex'],
         'visibility': hour['visibility'],
     })
     for field in ('precipIntensity', 'precipIntensity', 'precipProbability', 'precipType'):
         if field in hour:
             row[field] = hour[field]
     daily_weather.append(row)

Or to make it even neater:

fields = ('time', 'summary', 'icon', 'temperature', 'apparentTemperature', 'dewPoint', 'humidity', 'pressure', 'windSpeed', 'windBearing', 'cloudCover',  'uvIndex', 'visibility', 'precipIntensity', 'precipIntensity', 'precipProbability', 'precipType')

for hour in weather['hourly']['data']:
     row = {
         'realtime': datetime.fromtimestamp(hour['time'])
     }
     for field in fields:
         if field in hour:
             row[field] = hour[field]
     daily_weather.append(row)
bertilnilsson
  • 304
  • 1
  • 4