0

In advance: Sorry for any bady formatting, this is my very first post!

I'm trying to create a program that scrapes "CoinMarketCap" and compares the prices from a South African exchange (Luno) and all the other Bitcoin exchanges.

Sadly, it doesn't work on the https://coinmarketcap.com/de/currencies/bitcoin/markets/ page. It works on the https://coinmarketcap.com/de/exchanges/luno/ page though.

Any suggestions? Here is my code:

from bs4 import BeautifulSoup 
import requests
from time import sleep
from random import randint

def scrapeWebsite(link):
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

    results = requests.get(link, headers=headers)

    src = results.content

    soup = BeautifulSoup(src,features="html.parser")

    items = []

    print(soup.prettify())

    for tr in soup.find_all("tr"):
        line = ""
        for td in tr.find_all("td"):
            line = line + td.text + "/"
            if(td.text == "Kürzlich"):
                items.append(line)
    return items



itemsLuno = scrapeWebsite("https://coinmarketcap.com/de/currencies/bitcoin/markets/")

#Coins on Luno are: Bitcoin, Ethereum, Litecoin and ripple

for item in itemsLuno:
        print(item)

1 Answers1

0

the content of the first page is generated by javascript, so when you fetch the page you fetch the initial, unmodified html. you fetch the response getting from the server before execute the js in your browser.check this response here
in your case you need to render the javascript content before you crawl the page. you can do that using scrapy framework or selenium for exemple in selenium

from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)
html = driver.page_source
Belhadjer Samir
  • 1,461
  • 7
  • 15