0

I am trying to get the hrefs from url, put into a list and print the one of the list out. for example the third, but all I got is the third character of every href.

import urllib
from bs4 import BeautifulSoup

newlist=[]
page = urllib.urlopen("http://python-data.drchuck.net/known_by_Kamran.html").read()
soup = BeautifulSoup(page, "html.parser")
tags = soup.find_all('a')
for tag in tags:
    newlist=tag.get("href", None)
    print newlist[2]

the output is : t t t t t t t...

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
Yiqun
  • 11
  • 1
  • You are reassigning `newlist=tag.get("href", None)` which is a string or None not a list. This is very basic stuff, you should consider reading a few tutorials. – Padraic Cunningham Oct 08 '16 at 16:46

1 Answers1

-1

the below prints all the href correctly.

import urllib
from bs4 import BeautifulSoup

newlist=[]
page = urllib.urlopen("http://www.django-rest-framework.org/api-guide/throttling/#how-clients-are-identified").read()
soup = BeautifulSoup(page, "html.parser")
tags = soup.find_all('a', href=True)
for tag in tags:
    print tag['href']

PS: the webpage that you mentioned is not accessible, so I used the different one.

Deendayal Garg
  • 5,030
  • 2
  • 19
  • 33