0

Trying to develop a web scraper that loops through a JSON object responses a finite number of times for testing purposes. Program so far:

links = []

i = 0
while i < 3:
    for response[i] in responses:
        url = response[i]["PlayerProfile"]
        playername = response[i]["playername"]
        browser = init_browser()
        browser.visit(url)
        html = browser.html
        soup = bs(html, 'html.parser')
        img_url = soup.find("img").text
        links.append({
            "playername": playername,
            "img_url": img_url
        })
i += 1

Loop runs for more than 3 iterations; while loop doesn't seem to work. Would like to view output first before setting loop to run 3,000 times. Where is my mistake?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
danielvaldes
  • 45
  • 1
  • 5

2 Answers2

1

I think you need to indent i += 1 another level so it increments for each response. (Also, you don't need the semicolon.) But you don't really need a for loop and while loop to do this, I don't think. How about something like this:

for e,i in enumerate(responses):
    if e > 2:
        break
    # edit all variables like this
    url = i["PlayerProfile"] 
Matt L.
  • 3,431
  • 1
  • 15
  • 28
1

Look at this line:

for response[i] in responses:

This makes response[i] the target of the for loop; each element of responses is assigned to response[i]. You don't show what response is in your code; perhaps it's a list with at least 3 elements, or it's a dictionary. Either way, it is being altered.

The syntax is for target in iterable: body, where Python will take each element from iterable, assigns it to target and execute the body.

So, if responses is ['foo', 'bar', 'baz', 'spam'], then the combination of your while loop and your for loop is this:

  1. while loop starts with i = 0, then the for loop runs with response[0] as the target, setting:

    • response[0] = 'foo'
    • response[0] = 'bar'
    • response[0] = 'baz'
    • response[0] = 'spam'


    and your for loop body then uses response[0] with each value in turn.

  2. while continues with i = 1, then the for loop runs with response[1] as the target, setting:

    • response[1] = 'foo'
    • response[1] = 'bar'
    • response[1] = 'baz'
    • response[1] = 'spam'


    and your for loop body then uses response[1] with each value in turn.

  3. while continues with i = 2, then the for loop runs with response[2] as the target, setting:

    • response[2] = 'foo'
    • response[2] = 'bar'
    • response[2] = 'baz'
    • response[2] = 'spam'


    and your for loop body then uses response[2] with each value in turn.

In the end, you'll have a response object with values for 0, 1 and 2 all set to 'spam'.

Note that you already have a while loop, you don't need a for loop as well. You want to use response = responses[i] to do the assignment yourself:

while i < 3:
    response = responses[i]

    # ...
    i += 1

or you can use a for loop over a range() object to give you the increasing i values:

for i in range(3):
    response = responses[i]

or you can use itertools.islice() to limit the iteration to the first 3 items:

from itertools import islice

for response in islice(responses, 3):
    # use response

You can also directly slice responses if it is a sequence object (a list or tuple):

for response in responses[:3]:
    # use response

but this requires creating a copy of that part of the responses sequence first!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343