1

I am trying to scrape the first 200 rows from a webpage. I am starting with just being able to print out the scraped-out data before loading it into a data frame. Still, my code keeps generating errors or sometimes returns an empty list, and also when the code in the container variable is split into two with its attribute name, only the first 77 rows are scraped.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time

url = 'https://coinmarketcap.com/all/views/all/'
path= xxxxxxxxx

service=Service(executable_path=path)
driver=webdriver.Chrome(service=service)
driver.get(url)
time.sleep(60)

containers = driver.find_elements(by='xpath',value='//tr[@class="cmc-table-row" and (@style="display: table-row" or @style="display: table-row;")]')

Ranks = []
Names = []
Images = []
Symbols = []
MarketCaps = []
Prices = []
CirculatingSupplys = [] 
Volume24hs  = []
Hr_percnt_1s = []
Hr_percnt_24s = []
day_percent_7s = []

for i in containers:
    Rank = i.find_element(by='xpath',value='./td/div').text 
    Name = i.find_element(by='xpath',value='./td/div/a[2]').text
    Symbol = i.find_element(by='xpath',value='./td[3]/div').text        
    MarketCap = i.find_element(by='xpath',value='./td/p/span[2]').text
    Price = i.find_element(by='xpath',value='./td[5]/div/a/span').text
    CirculatingSupply = i.find_element(by='xpath',value='./td[6]/div').text 
    Volume24h   = i.find_element(by='xpath',value='./td[7]/a').text
    Hr_percnt_1 = i.find_element(by='xpath',value='./td[8]/div').text
    Hr_percnt_24 = i.find_element(by='xpath',value='./td[9]/div').text
    day_percent_7 = i.find_element(by='xpath',value='./td[10]/div').text

    Ranks.append(Rank)
    Names.append(Name)
    Symbols.append(Symbol)
    MarketCaps.append(MarketCap)
    Prices.append(Price)
    CirculatingSupplys.append(CirculatingSupply)    
    Volume24hs.append(Volume24h)
    Hr_percnt_1s.append(Hr_percnt_1) 
    Hr_percnt_24s.append(Hr_percnt_24)
    day_percent_7s.append(day_percent_7)

print(Ranks)
print(Names)
print(Symbols)
print(MarketCaps)
print(Prices)
print(CirculatingSupplys)   
print(Volume24hs)
print(Hr_percnt_1s)
print(Hr_percnt_24s)
print(day_percent_7s)

driver.quit()
Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24

2 Answers2

0

You can parse their initial Json data embedded inside the HTML page:

import json

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = "https://coinmarketcap.com/all/views/all/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").text)
data = json.loads(data["props"]["initialState"])

columns = data["cryptocurrency"]["listingLatest"]["data"][0]["keysArr"]
table = data["cryptocurrency"]["listingLatest"]["data"][1:]

df = pd.DataFrame([row[:-2] for row in table], columns=columns)
print(df.to_markdown(index=False))

Prints:

circulatingSupply cmcRank dateAdded hasFilters id isActive isAudited lastUpdated marketPairCount maxSupply name quote.USD.dominance quote.USD.fullyDilluttedMarketCap quote.USD.lastUpdated quote.USD.marketCap quote.USD.marketCapByTotalSupply quote.USD.name quote.USD.percentChange1h quote.USD.percentChange24h quote.USD.percentChange30d quote.USD.percentChange60d quote.USD.percentChange7d quote.USD.percentChange90d quote.USD.price quote.USD.turnover quote.USD.volume24h quote.USD.ytdPriceChangePercentage rank selfReportedCirculatingSupply slug symbol tags.0 tags.1 tags.10 tags.11 tags.12 tags.13 tags.14 tags.15 tags.16 tags.17 tags.18 tags.19 tags.2 tags.20 tags.21 tags.22 tags.23 tags.24 tags.25 tags.26 tags.27 tags.28 tags.3 tags.4 tags.5 tags.6 tags.7 tags.8 tags.9 totalSupply tvl
1.94445e+07 1 2010-07-13T00:00:00.000Z False 1 1 False 2023-07-31T21:54:00.000Z 10379 2.1e+07 Bitcoin 48.2613 6.13546e+11 2023-07-31T21:54:00.000Z 5.68099e+11 5.68099e+11 USD 0.0751799 -0.0335935 -4.53183 8.59029 0.209653 1.86047 29216.5 0.0208523 1.18462e+10 75.7375 1 0 bitcoin BTC mineable pow boostvc-portfolio cms-holdings-portfolio dcg-portfolio dragonfly-capital-portfolio electric-capital-portfolio fabric-ventures-portfolio framework-ventures-portfolio galaxy-digital-portfolio huobi-capital-portfolio alameda-research-portfolio sha-256 a16z-portfolio 1confirmation-portfolio winklevoss-capital-portfolio usv-portfolio placeholder-ventures-portfolio pantera-capital-portfolio multicoin-capital-portfolio paradigm-portfolio bitcoin-ecosystem store-of-value state-channel coinbase-ventures-portfolio three-arrows-capital-portfolio polychain-capital-portfolio binance-labs-portfolio blockchain-capital-portfolio 1.94445e+07 nan
1.21071e+08 2 2015-08-07T00:00:00.000Z False 1027 1 False 2023-07-31T21:54:00.000Z 7200 nan Ethereum 19.0769 2.2456e+11 2023-07-31T21:54:00.000Z 2.2456e+11 2.2456e+11 USD 0.0502369 -0.388847 -3.51936 -0.763876 0.343046 -0.957346 1854.78 0.0200434 4.50094e+09 54.4405 2 0 ethereum ETH pos smart-contracts dcg-portfolio dragonfly-capital-portfolio electric-capital-portfolio fabric-ventures-portfolio framework-ventures-portfolio hashkey-capital-portfolio kenetic-capital-portfolio huobi-capital-portfolio alameda-research-portfolio a16z-portfolio ethereum-ecosystem 1confirmation-portfolio winklevoss-capital-portfolio usv-portfolio placeholder-ventures-portfolio pantera-capital-portfolio multicoin-capital-portfolio paradigm-portfolio injective-ecosystem layer-1 coinbase-ventures-portfolio three-arrows-capital-portfolio polychain-capital-portfolio binance-labs-portfolio blockchain-capital-portfolio boostvc-portfolio cms-holdings-portfolio 1.21071e+08 nan
8.38193e+10 3 2015-02-25T00:00:00.000Z False 825 1 True 2023-07-31T21:54:00.000Z 58242 nan Tether USDt 7.1199 8.66969e+10 2023-07-31T21:54:00.000Z 8.38102e+10 8.66969e+10 USD 0.00270599 -0.0104449 -0.0463599 -0.0570385 -0.016347 -0.0718636 0.999892 0.221416 1.85569e+10 0.02 3 0 tether USDT payments stablecoin optimism-ecosystem asset-backed-stablecoin avalanche-ecosystem solana-ecosystem arbitrum-ecosytem moonriver-ecosystem injective-ecosystem bnb-chain usd-stablecoin 8.67063e+10 nan
1.53855e+08 4 2017-07-25T00:00:00.000Z False 1839 1 True 2023-07-31T21:54:00.000Z 1551 nan BNB 3.155 3.71383e+10 2023-07-31T21:54:00.000Z 3.71383e+10 3.71383e+10 USD 0.223376 -0.250832 -2.75968 -20.9338 1.17205 -25.2036 241.385 0.0295489 1.0974e+09 -1.1272 4 0 bnb BNB marketplace centralized-exchange celsius-bankruptcy-estate payments smart-contracts alameda-research-portfolio multicoin-capital-portfolio bnb-chain layer-1 sec-security-token alleged-sec-securities 1.53855e+08 nan
5.26939e+10 5 2013-08-04T00:00:00.000Z False 52 1 True 2023-07-31T21:54:00.000Z 1017 1e+11 XRP 3.1253 6.98154e+10 2023-07-31T21:54:00.000Z 3.67884e+10 6.98074e+10 USD -0.417038 -0.951795 47.3769 37.2266 -0.659997 50.0338 0.698154 0.0405228 1.49077e+09 106.089 5 0 xrp XRP medium-of-exchange enterprise-solutions arrington-xrp-capital-portfolio galaxy-digital-portfolio a16z-portfolio pantera-capital-portfolio 9.99886e+10 nan
2.64388e+10 6 2018-10-08T00:00:00.000Z False 3408 1 False 2023-07-31T21:54:00.000Z 13499 nan USD Coin 2.246 2.64381e+10 2023-07-31T21:54:00.000Z 2.64381e+10 2.64381e+10 USD 0.00594249 -0.0057927 -0.0359594 -0.0286095 -0.0212605 -0.00772583 0.999976 0.106004 2.80254e+09 -0.0017 6 0 usd-coin USDC medium-of-exchange stablecoin asset-backed-stablecoin hedera-hashgraph-ecosystem fantom-ecosystem arbitrum-ecosytem moonriver-ecosystem bnb-chain usd-stablecoin optimism-ecosystem 2.64388e+10 nan
1.40403e+11 7 2013-12-15T00:00:00.000Z False 74 1 False 2023-07-31T21:54:00.000Z 760 nan Dogecoin 0.9219 1.08519e+10 2023-07-31T21:54:00.000Z 1.08519e+10 1.08519e+10 USD 0.223288 -0.967885 12.8639 7.70454 3.55617 -2.00884 0.0772912 0.031614 3.43072e+08 10.0628 7 0 dogecoin DOGE mineable pow scrypt medium-of-exchange memes payments doggone-doggerel bnb-chain 1.40403e+11 nan
3.50073e+10 8 2017-10-01T00:00:00.000Z False 2010 1 False 2023-07-31T21:54:00.000Z 886 4.5e+10 Cardano 0.9147 1.38403e+10 2023-07-31T21:54:00.000Z 1.07669e+10 1.10815e+10 USD -0.318404 -1.13681 5.93883 -16.1587 0.691922 -21.4402 0.307561 0.0179873 1.93668e+08 23.1375 8 0 cardano ADA dpos pos alleged-sec-securities platform research smart-contracts staking cardano-ecosystem cardano layer-1 sec-security-token 3.60303e+10 nan

...and so on.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

To scrape the first any number of rows from the webpage All Cryptocurrencies | CoinMarketCap ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategies:

  • Sample code for first 10 rows:

    driver.get("https://coinmarketcap.com/all/views/all/")
    Ranks = []
    Names = []
    Symbols = []
    MarketCaps = []
    Prices = []
    CirculatingSupplys = [] 
    Ranks = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmc-table__table-wrapper-outer']//table//td[contains(@class, 'cmc-table__cell--sort-by__rank')]")))][:10]
    Names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmc-table__table-wrapper-outer']//table//td[contains(@class, 'cmc-table__cell--sort-by__name')]")))][:10]
    Symbols = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmc-table__table-wrapper-outer']//table//td[contains(@class, 'cmc-table__cell--sort-by__symbol')]")))][:10]
    MarketCaps = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmc-table__table-wrapper-outer']//table//td[contains(@class, 'cmc-table__cell--sort-by__market-cap')]")))][:10]
    Prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='cmc-table__table-wrapper-outer']//table//td[contains(@class, 'cmc-table__cell--sort-by__price')]")))][:10]
    for i,j,k,l,m in zip(Ranks,Names,Symbols,MarketCaps,Prices):
      print(f"Rank {i} is {j} with Symbol {k} having MarketCap about {l} and price is {m}")
    driver.quit()
    
  • Console output:

    Rank 1 is Bitcoin with Symbol BTC having MarketCap about $568,492,505,623 and price is $29,236.64
    Rank 2 is Ethereum with Symbol ETH having MarketCap about $223,175,213,597 and price is $1,857.03
    Rank 3 is Tether USDt with Symbol USDT having MarketCap about $83,828,994,041 and price is $1.00
    Rank 4 is BNB with Symbol BNB having MarketCap about $37,167,422,893 and price is $241.57
    Rank 5 is XRP with Symbol XRP having MarketCap about $36,845,601,938 and price is $0.6992
    Rank 6 is USD Coin with Symbol USDC having MarketCap about $26,492,203,043 and price is $1.00
    Rank 7 is Dogecoin with Symbol DOGE having MarketCap about $10,915,973,423 and price is $0.07775
    Rank 8 is Cardano with Symbol ADA having MarketCap about $10,748,458,652 and price is $0.307
    Rank 9 is Solana with Symbol SOL having MarketCap about $9,605,784,826 and price is $23.75
    Rank 10 is TRON with Symbol TRX having MarketCap about $7,000,126,116 and price is $0.07811
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank you for the reply, I really appreciate it. I would love it if you can scrape for the first 100 rows then I could figure out the rest. I think I have a problem with exceeding the 77th row while carrying out web scraping procedures. Thank you! – Mubaraq Onipede Aug 01 '23 at 21:34
  • @MubaraqOnipede **10** or **77** or **200**, the logic will remain the same. Atmost you have to scroll down periodically. – undetected Selenium Aug 01 '23 at 21:36
  • Thanks for helping out. I just wrote the code and it worked for the first 20 rows. I would appreciate it if I can get one that would scrape for me perhaps the first 100 rows. Thank you! – Mubaraq Onipede Aug 01 '23 at 21:51
  • I just tried for the first 50 by changing [:10] to [:50] but it printed for only the first 20 – Mubaraq Onipede Aug 01 '23 at 21:54