1

I'm trying to get links to articles from https://finance.yahoo.com/topic/stock-market-news I run the following code using python3

url = "https://finance.yahoo.com/topic/stock-market-news"
    r1 = requests.get(url)
    page = r1.content
    soup = BeautifulSoup(page, 'html5lib')
    #print(soup.prettify())
    href = soup.find_all('a')
    boxes = []
    links = []
    for ref in href:
        curr = ref.parent.find('u')
        if curr is not None:
            boxes.append(ref)
            links.append(ref['href'])
    print(boxes)
    print(links)

but while i do manage to get the links some of them looks weird

/news/stock-market-news-live-july-30-2020-221505732.html
/m/f39537a4-425d-3378-9ef7-e7188a513ca6/stock-index-futures-lower.html
/m/6c87eec2-e5a1-3bc3-916e-4f74b3c508bf/global-stocks-slump-as-u-s-.html
https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html
https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html
https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html

why is it happening and how can i now access those links?

another sub question, the site has a lot more links than what i am finding, i think it has to do with the site loading more as you scroll down, how could i bypass it so that i can load a certain amount of articles, for example 10 more?

  • prepend `finance.yahoo.com` to them, they're relative links – M Z Jul 30 '20 at 13:04
  • so how come the href has some relative links while others has the absolute path? i tried to check on the site but when i inspect the elements its only relative links. – shaked migdal Jul 30 '20 at 13:08
  • @shakedmigdal Refer https://stackoverflow.com/questions/2005079/absolute-vs-relative-urls – bigbounty Jul 30 '20 at 13:11
  • @bigbounty i get the difference between the two, what i don't understand is why some are returning as relative and some as absolute while manually inspecting the element always shows a relative path? does it gets automaticlly added later on? – shaked migdal Jul 30 '20 at 13:16

1 Answers1

0

Add this line links.append(link if link.startswith("https://finance.yahoo.com") else f"https://finance.yahoo.com{link}" )

from bs4 import BeautifulSoup
import requests
from requests import get

url = "https://finance.yahoo.com/topic/stock-market-news"
r1 = requests.get(url)
page = r1.content
soup = BeautifulSoup(page, 'html5lib')
#print(soup.prettify())
href = soup.find_all('a')
boxes = []
links = []
for ref in href:
    curr = ref.parent.find('u')
    if curr is not None:
        boxes.append(ref)
        link = ref['href']
        links.append(link if link.startswith("https://finance.yahoo.com") else f"https://finance.yahoo.com{link}" )
print(boxes)
print("___"*10)
print(links)

Output:

[<a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="11" href="/m/d79af817-5b40-3545-a085-322c5d27628e/dow-futures-slump-as-q2-gdp.html" target="_self"><u class="StretchedBox" data-reactid="12"></u><!-- react-text: 13 -->Dow Futures Slump As Q2 GDP Plunges Most On Record, Weekly Jobless Claims Rise; Trump Raises Election Delay Prospect<!-- /react-text --></a>, <a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="28" href="/m/8f0877fd-0c34-306c-964d-2c9dd2aebd3c/ups-stock-is-jumping-after.html" target="_self"><u class="StretchedBox" data-reactid="29"></u><!-- react-text: 30 -->UPS Stock Is Jumping After the Company Delivered Smashing Earnings<!-- /react-text --></a>, <a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="48" href="/news/futures-sink-data-shows-historic-125417167.html"><u class="StretchedBox" data-reactid="49"></u><!-- react-text: 50 -->Futures sink as data shows historic slump<!-- /react-text --></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="11" href="https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="12">Q2 GDP: US economy contracted by worst-ever 32.9% in Q2, crushed by coronavirus lockdowns</span><u class="StretchedBox Z(1)" data-reactid="13"></u></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="26" href="https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="27">Influencers with Andy Serwer: Bill Gates</span><u class="StretchedBox Z(1)" data-reactid="28"></u></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="38" href="https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="39">Jobless claims top 1M again in latest week as coronavirus keeps battering workers</span><u class="StretchedBox Z(1)" data-reactid="40"></u></a>]
______________________________
['https://finance.yahoo.com/m/d79af817-5b40-3545-a085-322c5d27628e/dow-futures-slump-as-q2-gdp.html', 'https://finance.yahoo.com/m/8f0877fd-0c34-306c-964d-2c9dd2aebd3c/ups-stock-is-jumping-after.html', 'https://finance.yahoo.com/news/futures-sink-data-shows-historic-125417167.html', 'https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html', 'https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html', 'https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html']
bigbounty
  • 16,526
  • 5
  • 37
  • 65
  • Thanks that works, could you also help me with getting more links from the first page? i don't know how to simulate the loading from the scrolling down. – shaked migdal Jul 30 '20 at 13:25
  • I would request you to post one more question as the question context is different – bigbounty Jul 30 '20 at 13:33
  • i would as soon as SO will let me. apparently i need to "rest a while" before posting another question. (also NO REST is needed just MORE COFFEE) – shaked migdal Jul 30 '20 at 13:35