1

I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925

I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful.

I also tried to look into view-source:https://www.prisjakt.nu/produkt.php?pu=5183925 how this is being called.

<div class="graph" data-testid="graph" data-test="PriceHistoryGraph">

I am trying to print the history of the prices of an item from the website. For example something similar to a below snippet which is in the json format i found from "view source".

"nodes":[{"date":"2019-09-10","lowestPrice":13195},{"date":"2019-09-11","lowestPrice":12990},{"date":"2019-09-12","lowestPrice":12990},

I am suspecting that the above data can be found at

<rect class = "vx-bar" ...... where data="[Object Object][Object Object][Object Object]..." 

is a list of arrays with two elements in each array. Something similar to to above snippet "nodes". Isn't it?

A simple piece of code i am using at the moment for a biref idea which will print entire layout including the graph and table below.

my_url = 'https://www.prisjakt.nu/produkt.php?pu=5183925'
driver.get(my_url)
sleep(3)

page = requests.get(my_url, headers=headers)
soup = soup(page.content, 'html.parser')
data = soup.findAll(id="statistics")
print(data)

Any suggestions with an example or a solution would help me. Thanks in Advance!

Revanth Tv
  • 63
  • 8

1 Answers1

0

You're right, the graph is being constructed dynamically, but you can easily grab that data.

Here's how:

import requests

response = requests.get('https://www.prisjakt.nu/_internal/graphql?release=2020-11-20T07:33:45Z|db08e4bc&version=6f2bf5&main=product&variables={"id":5183925,"offset":0,"section":"statistics","statisticsTime":"1970-01-02","marketCode":"se","personalizationExcludeCategories":[],"userActions":true,"badges":true,"media":true,"campaign":true,"relatedProducts":true,"campaignDeals":true,"priceHistory":true,"recommendations":true,"campaignId":2,"personalizationClientId":"","pulseEnvironmentId":"sdrn:schibsted:environment:undefined"}').json()


for node in response["data"]["product"]["statistics"]["nodes"]:
    print(f"{node['date']} - {node['lowestPrice']}")

Output:

2019-09-10 - 13195
2019-09-11 - 12990
2019-09-12 - 12990
2019-09-13 - 12605
2019-09-14 - 12605
2019-09-15 - 12605
2019-09-16 - 12970
2019-09-17 - 12970
2019-09-18 - 12970
2019-09-19 - 12969
2019-09-20 - 12969
2019-09-21 - 12969
2019-09-22 - 12969
2019-09-23 - 9195
2019-09-24 - 12970
and so on...
baduker
  • 19,152
  • 9
  • 33
  • 56
  • Wow you are super fast. I am actually glad that as a beginner i looked at the correct places. But i didn't knew the technique to access and fetch for a json format. Now i know I am wondering how did you find the url which you used in variable 'response' ? – Revanth Tv Nov 20 '20 at 09:19
  • Pro tip: learn to use your browser's Developer Tool. I found the request link in Developer Tool -> Network -> XHR tab. – baduker Nov 20 '20 at 09:25
  • 1
    I found it in Developer Tool -> Network -> XHR tab->headers. Thanks! :) – Revanth Tv Nov 20 '20 at 09:42