Web scrapping multiple pages whit count and offset features in the URL

Question

I am trying to web scrape tickers from the Yahoo Fiance website for crypto-currencies. However I can't tackle the pagination problem. I have tried to loop on the count and offset parameters in the url this way:


import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

list_ticker=[]
for i in range(8):
    url=f'https://finance.yahoo.com/cryptocurrencies?count=25&offset={i*25}'
    r= requests.get(url)
    soup=bs(r.text,"html.parser")
    for i in range(25):
        list_ticker.append(soup.find_all('a', {'class': 'Fw(600) C($linkColor)'})[i].text)

It does not work. Can I solve this problem using this library?

Thanks in advance!

YahooFinance have probably some anti scrapping prevence and after loading the page there are another calls to get data. I didn't find problem in your code. You can see requests in Chrome dev tools > Network > Fetch/XHR -- after you hit "Next" button on website. Requests had names like spark?symbols.... and screener?.... I would suggest you to use some API's to get desired data :) — darthbane426, Jul 01 '22 at 11:51
Thanks @darthbane426 I did spot two requests that were named spark?symbol and it was indeed right after a screener? one. Does that mean I cannot scrape more than one page of data? — Nicolas Zimnovitch, Jul 01 '22 at 12:22
Also I was wondering what kind of API you were referring to? Should I make one myself? I am really not that familiar with those subjects. I have tried Yahoo finance API for python and there are no such features. — Nicolas Zimnovitch, Jul 01 '22 at 12:23
There is probably the way to call that "spark?.." or "screener?" requests with POST method. But it would takes more calls for every page and json parsing. You can see this post (quite similar problem) [link](https://stackoverflow.com/questions/39218742/using-beautifulsoup-to-search-through-yahoo-finance). Here you can find API's for crypto informations with python examples [link](https://towardsdatascience.com/top-5-best-cryptocurrency-apis-for-developers-32475d2eb749) — darthbane426, Jul 01 '22 at 12:38
No problem ;) I've found another solution with usage of Selenium. Here is the how-to (chapter 2) but i still recommend using the API's because it's easier to maintain. [link](https://blog.jovian.ai/web-scraping-yahoo-finance-using-python-7c4612fab70c) — darthbane426, Jul 01 '22 at 12:47
Alright I followed your advice and finnaly went with the finnhub API! I was able to retrieve a listing of every tickers symbol by exchange. I chose Binance as a reference! — Nicolas Zimnovitch, Jul 01 '22 at 14:31

Web scrapping multiple pages whit count and offset features in the URL

0 Answers0