Why does Beautiful Soup Return filename instead of full link?

Question

Using below simple code, I'm facing the following problem: why does Beautiful Soup return only the file names rather than the full link addresses?

from bs4 import BeautifulSoup
import urllib2
url = 'http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/I000650R.HTM'
data = urllib2.urlopen(url).read()
page = BeautifulSoup(data,'lxml')
for link in page.findAll('a'):
       l = link.get('href')
       print l

All I'm getting as output:

I000660R.HTM
I000670R.HTM
I000680R.HTM
I000690R.HTM
I000700R.HTM
I000706R.HTM
I000707R.HTM
I000708R.HTM
I000709R.HTM
000710.HTM
000711.HTM
000712.HTM
000713.HTM
000714.HTM
000715.HTM

Presumably because the href links on that page are all relative. — jonrsharpe, Jan 16 '16 at 13:39

score 0 · Answer 1 · answered Jan 16 '16 at 13:45

0

Problem solved, given the relativeness of the links I concatenated the output with the root of the url. Thanks.

answered Jan 16 '16 at 13:45

mr_bungles

77
1
11

Why does Beautiful Soup Return filename instead of full link?

1 Answers1