Web Crawler Array error: "list index out of range"

Question

I am not too strong in Python but I am building a site for a guild I am a part of in a game, and I am using a crawler to pull some of our members data off of another site (yes I did receive permission to do so). I am using beautiful soup 4 with python 3.7. I am receiving the error:

Traceback (most recent call last):
  File "/Users/UsersLaptop/Desktop/swgohScraper.py", line 21, in <module>
    temp = members[count]
IndexError: list index out of range

My Code is Here:

from requests import get
from bs4 import BeautifulSoup
# variables
count = 1

# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

members = soup.find_all('tr')
members.sort()

for users in members:
    temp = members[count]
    name = temp.td.a.strong.text
    names.append(name)
    count += 1

print(names)

I am guessing I am receiving this error due to the fact that members has 50 members in it but the 50th is null, and I would need to stop the array from appending if the data was null however when I tried putting an if loop under my for loop such as:

if users.find('tr') is not None:

it does not fix the issue. It would be greatly appreciated if someone could explain how to solve this issue, and why the solution works. Thank you in advance!

PS even after looking at similarly asked questions I cannot seem to figure this out and it is extremely frustrating. — Jeremy McArthur, Jul 16 '18 at 04:45

score 0 · Answer 1 · answered Jul 16 '18 at 05:00

0

When you are using for in loop, you don't need the count variable.

for users in members:
    name = users.td.a.strong.text
    names.append(name)

answered Jul 16 '18 at 05:00

B45i

2,368
2
23
33

Brings up "AttributeError: 'NoneType' object has no attribute 'a'" when this happens. I realized I didn't need the count variable but I seemed to have it working in a better state with it. – Jeremy McArthur Jul 16 '18 at 09:00

score 0 · Answer 2 · answered Jul 16 '18 at 05:03

0

change count=0 at first,because members index from 0

answered Jul 16 '18 at 05:03

Spaceship222

759
10
20

I realize index does not start at 0, but the beginning of the list is not an item I need so I am skipping it. This does not solve the issue. – Jeremy McArthur Jul 16 '18 at 08:59

score 0 · Answer 3 · answered Jul 16 '18 at 05:17

0

Your code should be like this:

from requests import get
from bs4 import BeautifulSoup    
# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

members = soup.find_all('tr')
members.sort()

for users in members:
    name = users.td.a.strong.text
    names.append(name)


print(names)

You can change count to 0 because python indexing starts from 0, but best is still to directly do it from the iterator users

answered Jul 16 '18 at 05:17

U13-Forward

69,221
14
89
114

This code does not work. I realize that python indexing starts at 0. I also originally wrote this code but this code brings up an error saying "NoneType" object has no attribute a. It is clearly grabbing something that has no anchor tag, so I would have to leave out one of the 50 elements but I am not sure how. I believe there are 49 members out of a possible 50 so I need the 49 members and then for it to fill in one empty spot with just an empty string. – Jeremy McArthur Jul 16 '18 at 08:56

experiment · Accepted Answer · 2018-07-16T09:23:01.213

0

This would do the job of what you trying to get from the code i.e trying to get the names as could be infered from the code

from requests import get

from bs4 import BeautifulSoup

# variables
count = 1

# lists to store data
names = []
gp = []
arenaRank = []

url = 'https://swgoh.gg/g/21284/gid-1-800-druidia/'
response = get(url)

soup = BeautifulSoup(response.content, 'html.parser')

for users in soup.findAll('strong'):
    if users.text.strip().encode("utf-8")!= '':
        names.append(users.text.strip().encode("utf-8"))



print(names)

edited Jul 16 '18 at 09:23

answered Jul 16 '18 at 06:32

experiment

315
3
19

You are the closest to solving my issue, and the only one who hasn't given code that throws an error, so thank you! This is printing something like this: [u'Note', u'', u'GP'] which I am confused about because shouldn't strip be removing the u and the ' '? – Jeremy McArthur Jul 16 '18 at 09:08
That`s just the encoding part.I have edited the answer for the same! – experiment Jul 16 '18 at 09:15
added the code for ignoring any empty data ' ' too. – experiment Jul 16 '18 at 09:24
Thank you!! Can you explain why you need the .encode part? You are the only one who understood my question and were able to solve it. I really am having trouble understanding what the .encode("utf-8") does though. – Jeremy McArthur Jul 16 '18 at 09:24
Refer this https://stackoverflow.com/questions/2241348/what-is-unicode-utf-8-utf-16 and just one friendly advice,before asking a question read thoroughly about.I know learning new thing can be challenging but you will learn about it gradually,just try/read every nook and corner about it.Its a good practice will definetly help you grow! Happy coding and learning! – experiment Jul 16 '18 at 09:28
Okay thank you so much. I am only in college so I'm doing my best. I really love coding and I know I have a TON to learn. It's hard because I feel like I am pushing myself hard and stepping into topics that may be a bit advanced for me at the time but I dont want to slow down. All of this is simply for fun and practice. I really appreciate you being nice and bearing with me :) – Jeremy McArthur Jul 16 '18 at 09:32
You will figure things out! Just stay calm , do what you love and always be curious! Don`t be disheartened by failures. Best Wishes! – experiment Jul 16 '18 at 09:35

Web Crawler Array error: "list index out of range"

4 Answers4