This isn't just a simple how to retrieve links question. When I scrape a page, the href link returns something like '/people/4849247002'
, but if you inspect the page itself this href URL actually links to 'https://website/people/4849247002'
if you click it. how can I get the link with 'https://website/people/4849247002'
instead?
also side note, but what's the correct way to use BeautifulSoup to get a webpage? I've been using both of the following:
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
and
import requests
from bs4 import BeautifulSoup
import re
import time
source_code = requests.get('https://stackoverflow.com/')
soup = BeautifulSoup(source_code.content, 'lxml')
I'm currently using python 3.8