-5

Due to limitations on historical data on the coinmarketcap api plans, I am seeking to webscrape instead.

However, I am stuck at the first hurdle despite reading the crummy documentation on attributes.

import json 
import requests 
from bs4 import BeautifulSoup


r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text, 'lxml')
print(soup)

Contained in the output is the data which I am trying to scrape. The data I am trying to get:

Market Cap, Price and Circulating Supply for BTC at 5th September 2021.

The data appears in the output soon after <script id="__NEXT_DATA__" type="application/json"> and for this reason I thought that using __NEXT_DATA__ as the attribute id would allow me to access the data. Unfortunately not.

An example of the data structure where the data is contained looks as follows:

"listingHistorical":{"data":[{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","num_market_pairs":8848,"date_added":"2013-04-28T00:00:00.000Z","tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio","binance-labs-portfolio","arrington-xrp-capital","blockchain-capital-portfolio","boostvc-portfolio","cms-holdings-portfolio","dcg-portfolio","dragonfly-capital-portfolio","electric-capital-portfolio","fabric-ventures-portfolio","framework-ventures","galaxy-digital-portfolio","huobi-capital","alameda-research-portfolio","a16z-portfolio","1confirmation-portfolio","winklevoss-capital","usv-portfolio","placeholder-ventures-portfolio","pantera-capital-portfolio","multicoin-capital-portfolio","paradigm-xzy-screener"],"max_supply":21000000,"circulating_supply":18807550,"total_supply":18807550,"platform":null,"cmc_rank":1,"last_updated":"2021-09-05T23:00:00.000Z","quote":{"BTC":{"price":1,"volume_24h":585906.8067215424,"percent_change_1h":0,"percent_change_24h":0,"percent_change_7d":0,"market_cap":18807550,"fully_diluted_market_cap":null,"last_updated":"2021-09-05T23:59:03.000Z"},"USD":{"price":51753.41192620951,"volume_24h":30322676318.63,"percent_change_1h":-0.159917099159,"percent_change_24h":3.621580803777,"percent_change_7d":5.987281074996,"market_cap":973354882472.7817,"last_updated":"2021-09-05T23:00:00.000Z"}},"rank":1,"noLazyLoad":true},

Is there a simply solution for this?

Diop Chopra
  • 319
  • 3
  • 10
  • “*limitations on historical data on the coinmarketcap api plans*” Are you sure about this? The [Pricing](https://coinmarketcap.com/api/pricing/) page details that historical data is available via their API. – esqew Sep 13 '21 at 00:16
  • A bit tangential as well, but you should know what you’re trying to do is expressly prohibited by the target site’s [Terms of Service](https://coinmarketcap.com/terms/) (“*Use or introduce to the Service any data mining, crawling, "scraping", robot or similar automated or data gathering or extraction method,*” - it’s possible that your access could be blocked entirely once this activity is detected by the provider, breaking any workflow or app you may build on top of this extraction. – esqew Sep 13 '21 at 00:19
  • 1
    Duplicate [How to extract json from script tag using beautiful soup python?](https://stackoverflow.com/questions/61217541/how-to-extract-json-from-script-tag-using-beautiful-soup-python) – esqew Sep 13 '21 at 00:22
  • Thanks for the heads up. I did not realise I would have been breaching targets site's Terms of Service. I have now found an alternative solution using the coingecko API – Diop Chopra Sep 13 '21 at 12:13

2 Answers2

-1

This is just for the listing table, which is fully loaded on the page.

https://coinmarketcap.com/historical/20210905/ -> 20210905 -> 2021-09-05 is the date, just replace by the desired date and it will display the data https://coinmarketcap.com/historical/20210101/ for example, then scrape and extract the JSON data.

-1

You can try something like this:

r = requests.get('https://coinmarketcap.com/historical/20210905/')
soup = BeautifulSoup(r.text)

data = json.loads(soup.find('script', type='application/ld+json', id='__NEXT_DATA__').text)

historical_data = data['listingHistorical']['data']
print historical_data
SkyzohKey
  • 755
  • 7
  • 16