I am new to Python and I am trying to write a website scraper to get links from subreddits, which I can then pass to another class later on for automatic download of images from imagur.
In this code snippet, I am just trying to read the subreddit and scrape any imagur htmls from hrefs, but I get the following error:
AttributeError: 'list' object has no attribute 'timeout'
Any idea as to why this might be happening? Here is the code:
from bs4 import BeautifulSoup
from urllib2 import urlopen
import sys
from urlparse import urljoin
def get_category_links(base_url):
url = base_url
html = urlopen(url)
soup = BeautifulSoup(html)
posts = soup('a',{'class':'title may-blank loggedin outbound'})
#get the links with the class "title may-blank "
#which is how reddit defines posts
for post in posts:
print post.contents[0]
#print the post's title
if post['href'][:4] =='http':
print post['href']
else:
print urljoin(url,post['href'])
#print the url.
#if the url is a relative url,
#print the absolute url.
get_category_links(sys.argv)