1

I am pulling data in from an API that limits the number of records per request to 100. There are 7274 records in total and everything is returned as JSON.

I want to concatenate all 7274 records into a single variable/object and eventually export to a JSON file.

The response JSON objects are structured like this:

{"data":[{"key1":87,"key2":"Ottawa",..."key21":"ReggieWatts"}],"total":7274}

I just want the objects inside the "data" array so that the output looks like this:

 {'key1': 87, 'key2': 'Ottawa', 'key21': 'ReggieWatts'},{'key1': 23, 'key2': 'Cincinnati', 'key21': 'BabeRuth'},... 

I’ve tried without success to use the dict.update() method to concatenate the new values to a variable that’s collecting all the records.

I am getting this error: ValueError: dictionary update sequence element #0 has length 21; 2 is required

Here’s the stripped down code.

import json
import time
import random
import requests 
from requests.exceptions import HTTPError


api_request_limit = 100
# total_num_players = 7274
total_num_players = 201 # only using 201 for now so that we aren't hammering the api while testing
start_index = 0
base_api_url = "https://api.nhle.com/stats/rest/en/skater/bios?isAggregate=true&isGame=false&sort=[{%22property%22:%22playerId%22,%22direction%22:%22ASC%22}]&limit=100&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20202021%20and%20seasonId%3E=19171918&start="
player_data = {}
curr_data = {}

for curr_start_index in range(start_index, total_num_players, api_request_limit):
    api_url = base_api_url + str(curr_start_index)
    
    try:
        response = requests.get(api_url)
        # if successful, no exceptions
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except Exception as err:
        print(f'Other error occurred: {err}')
    else:
        # print('Success!')
        curr_data = response.json()

    player_data.update(curr_data['data'])
    # player_data = {**player_data, **curr_data['data']} # Does not work either
    # print(player_data)


    # for i in curr_skaters['data']:
    #     print(str(i['firstSeasonForGameType']) + ": " + str(i['skaterFullName'])  + " " + str(i['playerId']))
    
    set_delay = (random.random() * 2) + 1
    time.sleep(set_delay)

Should I be iterating through each of the 100 records individually to add them to player_data?

The ValueError implies that the issue is with the number of key:value pairs in each object which says to me I'm using the .update() method incorrectly here.

Thanks

sspboyd
  • 369
  • 4
  • 15
  • 1
    How would you do it if the data *didn't* come from JSON? Other than that, we can't help you without a [*complete* but minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). You should also [show the complete error message](https://meta.stackoverflow.com/questions/359146/why-should-i-post-complete-errors-why-isnt-the-message-itself-enough) – Karl Knechtel Apr 23 '21 at 16:55
  • Getting a different error than you claim. See @karl Knechtel's comment. `Traceback (most recent call last): File "ssss.py", line 29, in player_data.update(curr_data['data']) KeyError: 'data'` – Gulzar Apr 23 '21 at 16:56
  • Also getting `Other error occurred: HTTPSConnectionPool(host='api.com', port=443): Max retries exceeded with url: /stats/rest/?start=0 (Caused by SSLError(SSLCertVerificationError("hostname 'api.com' doesn't match either of 'www3.gehealthcare.com', 'apps.gehealthcare.com', 'beta-ae.gehealthcare.com', 'beta-africa.gehealthcare.com', 'beta-at.gehealthcare.com', 'beta-au.gehealthcare.com', 'beta-be.gehealthcare.com', 'beta-bg.gehealthcare.com', 'beta-br.gehealthcare.com', 'beta-ca.gehealthcare.com', 'beta-ch.gehealthcare.com', 'beta-cn.gehealthcare.com', ... Please make runnable code. – Gulzar Apr 23 '21 at 16:58
  • I am working on a reproducible example now. Meanwhile, the complete error message is: ```Traceback (most recent call last): File "/Stats/main.py", line 39, in player_data.update(curr_data['data']) ValueError: dictionary update sequence element #0 has length 21; 2 is required``` – sspboyd Apr 23 '21 at 17:02
  • "Please make runnable code." Done :) – sspboyd Apr 23 '21 at 17:22

2 Answers2

2

I figured it out with a big thanks to William.

I also found this post very helpful. https://stackoverflow.com/a/26853961/610406

Here's the fix I eventually landed on:

import json
import time
import random
import requests
from requests.exceptions import HTTPError

api_request_limit = 100
# total_num_players = 7274 # skaters
total_num_players = 201 # only using 201 for now so that we aren't hammering the api while testing
start_index = 0
base_api_url_skaters = "https://api.nhle.com/stats/rest/en/skater/bios?isAggregate=true&isGame=false&sort=[{%22property%22:%22playerId%22,%22direction%22:%22ASC%22}]&limit=100&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20202021%20and%20seasonId%3E=19171918&start="

player_data = [] # Needs to be a list.
curr_data = {}

for curr_start_index in range(start_index, total_num_players, api_request_limit):
    api_url = base_api_url_skaters + str(curr_start_index)
    
    try:
        response = requests.get(api_url)
        # if successful, no exceptions
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except Exception as err:
        print(f'Other error occurred: {err}')
    else:
        # print('Success!')
        curr_data = response.json()

        # *** >>>This line is what I needed! So simple in retrospect.<<< ***
        player_data = [*player_data, *curr_data['data']] 

    set_delay = (random.random() * 3) + 1
    time.sleep(set_delay)
    print(f'Counter: {curr_start_index}. Delay: {set_delay}. Record Count: {len(player_data)}.')

with open('nhl_skaters_bios_1917-2021.json', 'w') as f:
    json.dump(player_data,f)

As a Gist:
https://gist.github.com/sspboyd/68ec8f5c5cd15ee7467d4326e3b74111

sspboyd
  • 369
  • 4
  • 15
1

If you want them all in a single dictionary which can be exported to a json file, you'll need to have unique keys for each response. Perhaps the following will accomplish what you want:

response0 = {"data":[{"key1":87,"key2":"Ottawa","key21":"ReggieWatts"}],"total":7274}
response1 = {"data":[{"key1":23,"key2":"Cincinnati","key21":"BabeRuth"}],"total":4555}

all_data = {}
for i, resp in enumerate([response0, response1]):
    all_data[f'resp{i}'] = resp['data'][0]

This returns

all_data = {'resp0': {'key1': 87, 'key2': 'Ottawa', 'key21': 'ReggieWatts'},
 'resp1': {'key1': 23, 'key2': 'Cincinnati', 'key21': 'BabeRuth'}}

Edit: I went for a dictionary object initially since I think it saves more naturally as json, but to get it as a python list, you can use the following:

all_data = []
for resp in [response0, response1]:
    all_data.append(resp['data'][0])

Finally, this object is easily saveable as json:

import json
with open('saved_responses.json', 'w') as file:
    json.dump(all_data, file)
William
  • 381
  • 1
  • 8
  • Thanks William, that gets me pretty close. I am looking to output an object that just has the data objects like this. ```all_data = [{'key1': 87, 'key2': 'Ottawa', 'key21': 'ReggieWatts'},{'key1': 23, 'key2': 'Cincinnati', 'key21': 'BabeRuth'}] – sspboyd Apr 23 '21 at 17:03
  • 1
    Ah I see, check out my edit. Dictionaries save more naturally to JSON I think, but Python lists should also be JSON saveable. – William Apr 23 '21 at 17:09