0

I have a program which fetches an xml sheet from a url. There is a lot of data on this xml, as a result I can only see 2500 'profiles' if you will at a time.

In these xml profiles, i require the program to pull out each users ID number which is a 8 digit code. I also require the program to pull out the url to the next 2500 profiles which i did by using the endswith() function.

My problem is in the last page of data there is no link to match with and i require the loop to stop while also pulling the final set of ID's

Here is what i have so far:

myURL = 'blah'

while myUrl is not '':
    info = request.get(myUrl)

Convert it to a list of strings

    end_of_new_link = "thingy"
    for link in list
        if link.endswith(end_of_new_link)
            myUrl = link

I format the link so i can use it on the next iteration of the while loop

     elif link.startswith(IDNUMBER)
          listIDs.append(link)

Is there a way i can set the variable myUrl to empty string to exit the while loop or is my logic all wrong here

johnfk3
  • 469
  • 2
  • 5
  • 15

1 Answers1

1

I think the simplest way is to have two variables instead of one.

lastUrl, nextUrl = None, 'blah'

while nextUrl != lastUrl:
    # url gets consumed and becomes "old"
    info, lastUrl = request.get(nextUrl), nextUrl

Later on...

end_of_new_link = "thingy"
for link in list
    if link.endswith(end_of_new_link)
        nextUrl = link # now it's different so the loop will continue

Of course, you could make this unnecessarily abstract if you wanted to and have a wrapper object that marks if its encapsulated data has changed (or simply has been set) since the last read.

Shashank
  • 13,713
  • 5
  • 37
  • 63