Beautiful soup, list index out of range

Question

I looked at site html source, and found what i need for namePlayer, it was 4 column and 'a' tag. And i tried to find it at answers.append with 'namePlayer': cols[3].a.text

But when i complile it, i get IndexError. Then i try to change index to 2,3,4,5 but nothing.

Issue: why i get IndexError: list index out of range, when all is ok(i think :D)

source:

#!/usr/bin/env python3

import re
import urllib.request
from bs4 import BeautifulSoup

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"


def get_html(url):
    opener = AppURLopener()
    response = opener.open(url)
    return response.read()

def parse(html):
    soup = BeautifulSoup(html)
    table = soup.find(id='answers')

    answers = []

    for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

    answers.append({
        'namePlayer': cols[3].a.text
    })


    for answer in answers:
        print(answers)


def main():
    parse(get_html('http://jaze.ru/forum/topic?id=50&page=1'))

if __name__ == '__main__':
    main()

Double check what `cols` is. Apparently it doesn't have 4 elements. — Carcigenicate, Oct 20 '19 at 18:07
It can't, or it wouldn't cause that error. Make sure the data that you're trying to get isn't inside of another structure like an outer list or dictionary. — Carcigenicate, Oct 20 '19 at 19:05

score 0 · Answer 1 · answered Oct 20 '19 at 18:14

0

It does sound like you are providing an index for which a list element does not exist. Remember index starts at 0. example: 0,1,2,3. So if I ask for element 10 I would get an Index error.

answered Oct 20 '19 at 18:14

user251499

1
2

QHarr · Accepted Answer · 2019-10-21T18:12:25.750

You are overwriting cols during your loop. The last length of cols is zero hence your error.

for row in table.find_all('div')[16:]:
    cols = row.find_all('div')
    print(len(cols))

Run the above and you will see cols ends up at length 0.

This might also occur elsewhere in loop so you should test the length and also decide if your logic needs updating. Also, you need to account for whether there is a child a tag.

So, you might, for example, do the following (bs4 4.7.1+ required):

answers = []

for row in table.find_all('div')[16:]:
    cols = row.find_all('div:has(>a)')
    if len(cols) >= 3:
         answers.append({
        'namePlayer': cols[3].a.text
    })

Note that answers has been properly indented so you are working with each cols value. This may not fit your exact use case as I am unsure what your desired result is. If you state the desired output I will update accordingly.

EDIT:

playerNames

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = {i.text.strip() for i in soup.select('[id^=answer_] .left-side a')}

You can preserve order and de-duplicated with OrderedDict (this by @Michael - other solutions in that Q&A)

from bs4 import BeautifulSoup as bs
import requests
from collections import OrderedDict

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = OrderedDict.fromkeys(names).keys()

Code did not output answers ;(. I want to get PlayerName for each answer in website. — Switchback, Oct 21 '19 at 17:25
please can you provide examples of these playerNames that should come out. — QHarr, Oct 21 '19 at 17:26
'namePlayer': VANTY3 'namePlayer': KK#キング 'namePlayer': memories. and etc all the nicknames of the people who did answer this topic — Switchback, Oct 21 '19 at 17:31
Thanks a lot! It works, i get what i want. But I didn’t understand how he sorts these names, is it randomly? — Switchback, Oct 21 '19 at 18:03
The list will be in the correct order. The set has no order (the unique). You could instead de-duplicate in a loop. — QHarr, Oct 21 '19 at 18:09
Thanks a lot! Time to understand your code >< :D for find other list elements — Switchback, Oct 21 '19 at 18:28

Bhargav Desai · Answer 3 · 2019-10-21T07:46:45.997

0

why you use for loop for finding all div tag :

for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

by using this you got all the tag you want

cols = table.find_all('div')[16:]

so just change your code with this code and you got your answer.

edited Oct 21 '19 at 07:46

answered Oct 20 '19 at 19:52

Bhargav Desai

941
1
5
17

Beautiful soup, list index out of range

3 Answers3