-1

I looked at site html source, and found what i need for namePlayer, it was 4 column and 'a' tag. And i tried to find it at answers.append with 'namePlayer': cols[3].a.text

But when i complile it, i get IndexError. Then i try to change index to 2,3,4,5 but nothing.

Issue: why i get IndexError: list index out of range, when all is ok(i think :D)

source:

#!/usr/bin/env python3

import re
import urllib.request
from bs4 import BeautifulSoup

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"


def get_html(url):
    opener = AppURLopener()
    response = opener.open(url)
    return response.read()

def parse(html):
    soup = BeautifulSoup(html)
    table = soup.find(id='answers')

    answers = []

    for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

    answers.append({
        'namePlayer': cols[3].a.text
    })


    for answer in answers:
        print(answers)


def main():
    parse(get_html('http://jaze.ru/forum/topic?id=50&page=1'))

if __name__ == '__main__':
    main()

QHarr
  • 83,427
  • 12
  • 54
  • 101
Switchback
  • 97
  • 1
  • 8

3 Answers3

0

It does sound like you are providing an index for which a list element does not exist. Remember index starts at 0. example: 0,1,2,3. So if I ask for element 10 I would get an Index error.

0

You are overwriting cols during your loop. The last length of cols is zero hence your error.

for row in table.find_all('div')[16:]:
    cols = row.find_all('div')
    print(len(cols))

Run the above and you will see cols ends up at length 0.

This might also occur elsewhere in loop so you should test the length and also decide if your logic needs updating. Also, you need to account for whether there is a child a tag.

So, you might, for example, do the following (bs4 4.7.1+ required):

answers = []

for row in table.find_all('div')[16:]:
    cols = row.find_all('div:has(>a)')
    if len(cols) >= 3:
         answers.append({
        'namePlayer': cols[3].a.text
    })

Note that answers has been properly indented so you are working with each cols value. This may not fit your exact use case as I am unsure what your desired result is. If you state the desired output I will update accordingly.


EDIT:

playerNames

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = {i.text.strip() for i in soup.select('[id^=answer_] .left-side a')}

You can preserve order and de-duplicated with OrderedDict (this by @Michael - other solutions in that Q&A)

from bs4 import BeautifulSoup as bs
import requests
from collections import OrderedDict

r = requests.get('https://jaze.ru/forum/topic?id=50&page=1')
soup = bs(r.content, 'lxml')
answer_blocks = soup.select('[id^=answer_]')
names = [i.text.strip() for i in soup.select('[id^=answer_] .left-side a')]
unique_names = OrderedDict.fromkeys(names).keys()
QHarr
  • 83,427
  • 12
  • 54
  • 101
0

why you use for loop for finding all div tag :

for row in table.find_all('div')[16:]:
        cols = row.find_all('div')

by using this you got all the tag you want

cols = table.find_all('div')[16:]

so just change your code with this code and you got your answer.

Bhargav Desai
  • 941
  • 1
  • 5
  • 17