How to extract link from inside the
:BeautifulSoup

Question

I am trying to extract a link which is written like this:

<h2 class="section-heading">
    <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a>
</h2>

my code is:

from bs4 import BeautifulSoup
import requests, re

def get_data():
    url='http://www.nytimes.com/'
    s_code=requests.get(url)
    plain_text = s_code.text
    soup = BeautifulSoup(plain_text)
    head_links=soup.findAll('h2', {'class':'section-heading'})

    for n in head_links :
       a = n.find('a')
       print a
       print n.get['href'] 
       #print a['href']
       #print n.get('href')
       #headings=n.text
       #links = n.get('href')
       #print headings, links

get_data()

the like "print a" simply prints out the whole <a> line inside the <h2 class=section-heading> i.e.

<a href="http://www.nytimes.com/pages/world/index.html">World »</a>

but when I execute "print n.get['href']", it throws me an error;

print n.get['href'] 
TypeError: 'instancemethod' object has no attribute '__getitem__'

Am I doing something wrong here? Please help

I couldn't find some similar case question here, my issue is a bit unique here, I am trying to extract a link that is inside a specific class names section-headings.

Also, I think you mean to do `a.get('href')` and not `n.get` — Obsidian, Feb 12 '16 at 06:23
@cricket_007 that duplicate question does not answer this exact error, though it is useful; and it is also for an earlier version of the library. — Antti Haapala -- Слава Україні, Feb 12 '16 at 06:25
@AnttiHaapala - I was addressing the end-goal of the question, not the error, but yes I see what you're saying — OneCricketeer, Feb 12 '16 at 06:30

score 4 · Accepted Answer · edited May 23 '17 at 10:28

First of all, you want to fetch the href of the a element, thus you should be accessing a not n on that line. Secondly, it should be either

a.get('href')

or

a['href']

The latter form throws if no such attribute is found, whereas the former would return None, like the usual dictionary/mapping interface. As .get is a method, it should be called (.get(...)); indexing/element access wouldn't work for it (.get[...]), which is what this question is about.

Notice, that find might as well fail there, returning None, perhaps you wanted to iterate over n.find_all('a', href=True):

for n in head_links:
   for a in n.find_all('a', href=True):
       print(a['href'])

Even easier than using find_all is to use the select method which takes a CSS selector. Here with a single operation we only get those <a> elements with href attribute that are inside a <h2 class="section-heading"> as easily as with JQuery.

soup = BeautifulSoup(plain_text)
for a in soup.select('h2.section-heading a[href]'):
    print(a['href'])

(Also, please use the lower-case method names in any new code that you write).

yes i've tried that aswell. If you look at those commented line in my code "links = n.get('href')", by doing this way, it returns me 'None' in all for loop iteration... 0.o ! and n['href'] throws me an error ' return self.attrs[key] KeyError: 'href' ' — Anum Sheraz, Feb 12 '16 at 06:25

How to extract link from inside the
:BeautifulSoup

:BeautifulSoup

1 Answers1

Linked

How to extract link from inside the :BeautifulSoup

:BeautifulSoup

1 Answers1

Linked

How to extract link from inside the
:BeautifulSoup