-5

I'm using the requests module to collect some data from a website. This application runs once every day. The amount of rows of data I get changes every time, per request I can get a maximum 250 rows of data. If there is more then 250 rows of data the API gives me a follow uplink which can be used to get the rows 251 >- 500 etc.

Now I have a problem, sometimes the amount of data is < 250 rows, this means there is no followuplink to use and that's exactly where my program gives the following error:

KeyError: @odata.nextLink

This is a piece of the application:

    proxies = {'https': 'proxy.***.***.com:8080'}
    headers = {"grant_type": "password", 
              "username": "****", 
              "password": "****", 
              "persistent": "true", 
              "device": '{"DeviceUniqueId":"b680c452","Name":"Chrome","DeviceVersion":"36","PlatformType":"Browser"}'}

    url1 = 'https://****-***.com/odata/Results'
   

    params_1 = (
             ('$filter', mod_date),
             ('$count', 'true'),
             ('$select', 'Status'),
             ('$expand', 'Result($select=ResultId),Specification($select=Name), SpecificationItem($select=Name,MinimumValue, MaximumValue)\n\n'),)
    
     response_1 = requests.get(url_1, headers=headers, proxies=proxies, params=params_1)
     q_1 = response_1.json()
    
     next_link_1 = q_1['@odata.nextLink']
     q_1 = [tuple(q_1.values())]
    
     while next_link_1:
         new_response_1 = requests.get(next_link_1, headers=headers, proxies=proxies)
         new_data_1 = new_response_1.json()
         q_1.append(tuple(new_data_1.values()))
         next_link_1 = new_data_1.get('@odata.nextLink', None)

Now I actually want Python to only read the variable next_link_1 if its available otherwise it should just ignore it and collect what is available...

Premier12
  • 69
  • 5
  • It looks like you already have the answer in your code: `new_data_1.get('@odata.nextLink')` (you don't need the `, None` since `None` is already the default default return value for `dict.get`). – Iguananaut Oct 14 '21 at 12:23
  • @Iguananaut, but when I debug the program I get the keyerror at the row `next_link_1 = q_1['@odata.nextLink']`. – Premier12 Oct 14 '21 at 12:24
  • 1
    Right, so why not use `dict.get` there too? – Iguananaut Oct 14 '21 at 12:28
  • Can I do something about the fact that it takes so much time minute to get get output of this in my console? It seems like the server is working really slow – Premier12 Oct 14 '21 at 12:41

2 Answers2

1

You only want to enter the while loop when q_1 has the key '@odata.nextLink' Inside the while loop, this is already accomplished in the line next_link_1 = new_data_1.get('@odata.nextLink', None) You could use the same approach -- setting next_link_1 to None if there is no next link -- before the while loop:

next_link_1 = q_1.get('@odata.nextLink', None)

This can be simplified to

next_link_1 = q_1.get('@odata.nextLink')

as None is already the default default value of dict.get().

NB: The question title is wrong. The variable always exists, as you are setting it. Only the existence of the key @odata.nextLink is fragile. So, what you actually want to do is check the existence of a key in a dictionary. To understand what is going on, you should familiarize yourself with the dict.get() method.

There is also some obvious refactoring possible here, getting rid of the repetition of the first iteration, and moving it into the loop:

proxies = {'https': 'proxy.***.***.com:8080'}
headers = {
    'grant_type': 'password', 
    'username': '****', 
    'password': '****', 
    'persistent': 'true', 
    'device': '{"DeviceUniqueId":"b680c452","Name":"Chrome","DeviceVersion":"36","PlatformType":"Browser"}'
}
params = (
    ('$filter', mod_date),
    ('$count', 'true'),
    ('$select', 'Status'),
    ('$expand', 'Result($select=ResultId),Specification($select=Name), SpecificationItem($select=Name,MinimumValue, MaximumValue)\n\n'),
)

url = 'https://****-***.com/odata/Results'
data = []
while url:
    response = requests.get(
        url, 
        headers=headers, 
        proxies=proxies, 
        params=params,
    )
    response_data = response.json()
    data.append(tuple(response_data.values()))
    url = response_data.get('@odata.nextLink')
    params = tuple()
Jonathan Scholbach
  • 4,925
  • 3
  • 23
  • 44
  • So if im right. I should use next_link_1 = `q_1.get('@odata.nextLink', None)` before the loop but after `q_1`? – Premier12 Oct 14 '21 at 12:36
  • @Premier12 Yes. Or simply `q_1.get('@odata.nextLink')`. But more than that you should start trying to understand what the code you copy-pasted is actually doing :) That would have enabled you to solve your problem on your own. – Jonathan Scholbach Oct 14 '21 at 12:38
0

Use get in both places. Better yet, restructure your loop so that you only need one call.

proxies = {'https': 'proxy.***.***.com:8080'}
headers = {...}

url1 = 'https://****-***.com/odata/Results'

params = (...)

qs = []
next_link = url
get_args = {'headers': headers, 'proxies': proxies, 'params': params}
while True:
    response = requests.get(next_link, **get_args)
    q = response.json()
    qs.append(tuple(q.values())
    if (next_link := q.get('@odata.nextLink', None)) is None:
        break
    if 'params' in get_args:
        del get_args['params']  # Only needed in the first iteration

(I'm not terribly excited about how we ensure params is used only on the first iteration, but I think it's better than duplicating the process of defining next_link before the loop starts. Maybe something like this would be an improvement?

get_args = {...}  # As above
new_get_args = dict(headers=..., proxies=...)  # Same, but without params

while True:
    ...
    if (next_link := ...) is None:
        break
    get_args = new_get_arg

Repeated assignment to get_args is probably cheaper than repeatedly testing for and deleting the params key, at the cost of having a second dict in memory. You could even drop that after the first iteration by adding a second assignment new_get_args = get_args to the end of the loop, which would result in a pair of do-nothing assignments for later iterations.)

chepner
  • 497,756
  • 71
  • 530
  • 681