Python, Scraping BS4

Question

There are a lot of post about this subject but I still don't manage to achieve what I want so here is my problem:

I am trying to extract stock price from this site: https://bors.e24.no/#!/instrument/NHY.OSE

and I would like extract the price: 57,12 from the "inspection" text:

<div class="number LAST" data-reactid=".g.1.2.0">
57,12</div>

Here is the code I tried which generate "AttributeError" and 'NoneType' object has no attribute 'text'.

I also tried to remove .text, in the PRICE line, and the result is 'Price is: None'

from bs4 import BeautifulSoup
import requests
url = 'https://bors.e24.no/#!/instrument/NHY.OSE'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
PRICE= soup.find('div', class_= "number LAST").text
print('Price is:',(PRICE))

I believe that section of the page is generated using javascript in the client side. Not sure though. If that's the case, using just beautifulsoup won't work. See this question: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python — defiant, Sep 23 '22 at 13:49

score 0 · Answer 1 · answered Sep 23 '22 at 13:54

0

Try this:

import requests

headers = {
    'user_agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36',
}

api_url = "https://bors.e24.no/server/components?columns=ITEM, LAST, BID, ASK, CHANGE, CHANGE_PCT, TURNOVER, LONG_NAME&itemSector=NHY.OSE&type=table"
data = requests.get(api_url, headers=headers).json()
print(data["rows"][0]["values"]["LAST"])

Output:

56.92

answered Sep 23 '22 at 13:54

baduker

19,152
9
33
56

I am impressed and very thankful for your response! I could never have figured this out by myself! – Mats Seamont Sep 24 '22 at 11:34
Hi again! I've played around with the code you suggested and also checked other url´s...what I tried to find, without success, is the "Volum" (Volume) at the end of the day. I can find the volume for the individual trades but not accumulated volume...any suggestions? – Mats Seamont Sep 26 '22 at 13:20
Best to ask a new question @MatsSeamont. – baduker Sep 26 '22 at 13:31

score -1 · Answer 2 · answered Sep 23 '22 at 14:39

This happens because your

requests.get(url)

Will not get all information in the page, including the price you are looking for, because the said webpage will load some parts of it and only then fetch more data. Because of that, trying to select the div with className="number LAST"

PRICE= soup.find('div', class_= "number LAST").text

Will throw an error because this doesn't exist, yet.

There are some ways to fix this problem:

You can try to use libraries like Selenium, which is often recommended for scraping more dynamic pages that rely on some Javascript and API calls to load content.
You can open your developer tools and inspect the Network tab where you might find the request that fetches the price you are trying to scrap.

I believe that in your case, after taking a look at the Network tab myself, the right URL to request could be 'https://bors.e24.no/server/components?columns=TIME,+PRICE,+VOLUME,+BUYER,+SELLER,+ID&filter=ITEM%3D%3DsNHY&limit=5&source=feed.ose.trades.EQUITIES%2BPCC&type=history', which seems to return a dictionary with the price you are looking for.

import requests
url = 'https://bors.e24.no/server/components?columns=TIME,+PRICE,+VOLUME,+BUYER,+SELLER,+ID&filter=ITEM%3D%3DsNHY&limit=5&source=feed.ose.trades.EQUITIES%2BPCC&type=history'
page = requests.get(url)
print(page.json()["rows"][0]["values"]["PRICE"])

If you are looking to scrap various links, you will need to find a way to dynamically change the previous link to one that matches others that you are trying to crawl. Which I guess would mean to change "NHY" and "ose" to something that would match other stock that you are looking for.

Python, Scraping BS4

2 Answers2