0

Is there a way to scrape the data from this chart using python libraries like bs4 or requests?

I tried to look at the website source data but I don't see the data points anywhere in the HTML.

I saw one variable that changes as I move my mouse around the chart but I have no idea how that works.

https://infogram.com/world-container-index-1h17493095xl4zj

Any ideas on how I can download and save these datapoints?

anarchy
  • 3,709
  • 2
  • 16
  • 48

2 Answers2

1

I am able to extract the script text that populates the Chart and converted it to json format

final_data has the data in json format. I think you could now extract whatever you need.

Here is the Code.

import requests
from bs4 import BeautifulSoup
import json

url = "https://infogram.com/world-container-index-1h17493095xl4zj"
resp = requests.get(url)
html = resp.text


soup = BeautifulSoup(html, features="html.parser")

s = soup.findAll('script')
main_script = None

for i in range(len(s)):
    if s[i].contents:
        if 'window.infographicData' in s[i].contents[0]:
            main_script = s[i].contents[0]
            break


main_script = main_script.lstrip('window.infographicData=')
main_script = main_script.rstrip(';')

final_data = json.loads(main_script)


Ram
  • 4,724
  • 2
  • 14
  • 22
  • how would you extract the data from this though, the formatting looks so strange – anarchy Jun 23 '21 at 06:33
  • can i ask how you found the script and window.inforaphicData? – anarchy Jun 23 '21 at 06:44
  • if you look here https://en.macromicro.me/charts/947/commodity-ccfi-scfi i cant find the similar information to extract this – anarchy Jun 23 '21 at 06:45
  • @anarchy Everysite is different. You have to look whether your data is present in HTML code itself (or) being populated by Javascript (or) comes from an API. For example this site: https://en.macromicro.me/charts/947/commodity-ccfi-scfi gets the data from an API. – Ram Jun 23 '21 at 06:49
  • is there a way to extract that? – anarchy Jun 23 '21 at 06:50
  • Yes. You can directly make a GET request to the API using ```requests``` module and extract the data. – Ram Jun 23 '21 at 06:51
  • https://stackoverflow.com/questions/68094929/how-to-extract-a-chart-that-gets-it-data-from-an-api-using-python i posted another question if you want to check it out for this – anarchy Jun 23 '21 at 06:55
0

You can find all the data in the script balise at the end : line 3794.

Fred L
  • 1
  • sorry i dont understand, which part is that? – anarchy Jun 23 '21 at 05:51
  • 1
    All the data are at the end of the source, you must search : window.infographicData={ So you should write something like : if '{"infographicData"' in script.text: Look at this question, there is an example how to extract data : https://stackoverflow.com/questions/48030726/extracting-data-from-script-tag-using-beautifulsoup-in-python – Fred L Jun 23 '21 at 06:03