-5

I'm trying to loop through six Wikipedia pages to get a list of every song linked. It gives me this error when I run it in Terminal:

Traceback (most recent call last):
  File "scrapeproject.py", line 31, in <module>
    print (getTableLinks(my_url))
  File "scrapeproject.py", line 20, in getTableLinks
    html = urlopen(my_url)
  File "/Users/adriana/Software/Python-3.5.1/mybuild/lib/python3.5/urllib/request.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/adriana/Software/Python-3.5.1/mybuild/lib/python3.5/urllib/request.py", line 456, in open
    req.timeout = timeout
AttributeError: 'NoneType' object has no attribute 'timeout'

I think this is because a None keeps showing up when I print the song list. Anyone have any suggestions?

Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import sys
import http.client

main = "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_"
year = 2009

def createUrl(main, year):
    for i in range(0, 6): # increment years so i can get each link
        year += 1
        print ("\n\n", year, "\n\n")
        fullUrl = main + str(year)
        return fullUrl


my_url = createUrl(main, year) # this is how i make createUrl a variable to be used in other functions

def getTableLinks(my_url): # there is a random none appearing in my code

    # i think the problem is between here...

    html = urlopen(my_url)
    bsObj = BeautifulSoup(html.read(), "html.parser")
    tabledata = bsObj.find("table", {"class":"wikitable"}).find_all("tr")

    # ...and here

    for table in tabledata:
        try:
            links = table.find("a")
            if 'href' in links.attrs:
                print (links.attrs['href'])
        except:
            pass

print (getTableLinks(my_url))

2 Answers2

1

You're not returning anything from createUrl so None gets returned instead

If you want to create a batch of six urls to go to and then parse for data / do web scraping with.. I'd suggest appending them to a list or mapping each url to the function for parsing procedurally and then either doing that or return the list and iterating through it for parsing.

Pythonista
  • 11,377
  • 2
  • 31
  • 50
  • So if I change from print (fullUrl) to return fullUrl, it gives me: `2010 /wiki/Tik_Tok /wiki/Need_You_Now_(Lady_Antebellum_song) /wiki/Hey,_Soul_Sister /wiki/California_Gurls ... /wiki/Teach_Me_How_to_Dougie /wiki/Try_Sleeping_with_a_Broken_Heart /wiki/Lover,_Lover None` This is exactly what I want but I want it to do this up to 2015 instead of it just stopping there. I think the None means it can't move on to 2011 for some reason? – Adriana Barbat Mar 18 '16 at 21:36
0

The problem isn't in the area you highlighted. The problem is in the loop where you construct the fullUrl. Get rid of that entirely, as you don't need a function to construct the link.

Then below your function definitions, try:

for n in range(2008,2015):
    print(getTableLinks(main + str(n)))

Change the years to fit your needs.

Honestly the better way to do this, for future use, is to use error handling. This will allow you to run the function until there are no years left (throwing an exception) and the loop will exit. This saves you from having to check how many years there are, all you would do is adjust the starting year. To do this properly you'd want to look up error handling and specifically handle the error that is returned by trying an invalid year, and doing something like except AttributeError: or whatever the error is using the code sample below.

for n in range(2008,2015):
    try:
        print(getTableLinks(main + str(n)))
    except:
        break
Chris
  • 15,819
  • 3
  • 24
  • 37
  • Thank you so much! I'm still getting a weird None but it finally loops through my code and gets all the links I need. – Adriana Barbat Mar 18 '16 at 22:49
  • Good. Feel free to upvote. – Chris Mar 18 '16 at 23:00
  • Would love to... but I'm so new I don't have the privilege to lol – Adriana Barbat Mar 18 '16 at 23:46
  • @Chris for the record, "feel free to upvote" is a rude thing to say. Asking for an accept, with proper caution and humility, might be more adequate. But first you should wait for OP to choose on their own. Only if you suspect that they are lacking knowledge about the workings of the site (such as yourself), should you try to instruct them about how accepting and upvoting works. – Andras Deak -- Слава Україні Mar 19 '16 at 00:25
  • @AndrasDeak Yeah that's what that was intended to do. – Chris Mar 19 '16 at 01:09
  • 1
    Then keep practicing;) The tone should definitely be [along these lines](http://stackoverflow.com/questions/35633421/how-to-remove-omit-smaller-contour-lines-using-matplotlib/35664089#comment59090583_35664089), and you should wait a bit (at least a day!) before badgering OP about any of this. (This is all, I'll leave you alone now, just wanted to be clear on what I mean.) – Andras Deak -- Слава Україні Mar 19 '16 at 01:11